Project Home
Project Home
Documents
Documents
Wiki
Wiki
Discussion Forums
Discussions
Project Information
Project Info
Forum Topic - thread stack alignment on x86: (9 Items)
   
thread stack alignment on x86  
Dear all,

during some development (related to SSE vectorized matrix operations from eigen2 library)
I have found, that data alignment attributes, i.e.
  double array[1] __attribute__((aligned(16)));
does not work on threaded x86 code.

Running test code in attachment gives the following output on my x86 box:
# QCC aligntest.cc  
# ./a.out 
OK address: 8047be0
BAD address: 7fc6fa4

For the "BAD" variable address, which is thread's temporary variable, the alignment does not work.
My earlier test proved, that for the "main()" thread the aligned attribute does difference and works.

I think this is caused by the following define:
coreos_pub/trunk/services/system/public/kernel/cpu_x86.h:
#define STACK_ALIGNMENT 4

This results in alignment of thread's stack to 4, which is not what gcc alignment attributes assume.
Similar issues for other OSes were reported:
FreeBSD: http://lists.freebsd.org/pipermail/freebsd-threads/2004-November/002730.html
OpenSolaris: http://defect.opensolaris.org/bz/show_bug.cgi?id=10932

I guess, that defining STACK_ALIGNMENT on x86 to 16 (which is the case i.e. for ppc,sh,arm) will solve the issue, but I 
am not sure if it does not brake some other system assumptions.
Attachment: Text aligntest.cc 399 bytes
Re: thread stack alignment on x86  
Thread does not matter here. If you change the code to run the function in main thread...

	test_thread(NULL);
//	pthread_t tid;
//	pthread_create(&tid, NULL, test_thread, NULL);
//	pthread_join(tid, NULL);

...the result is the same.
Re: thread stack alignment on x86  
Ah, sorry... I did not pay attention that second address is always "BAD" for you. ;)
In any case, it's not a big deal to align the stack on your own if you really need it.
Re: thread stack alignment on x86  
Yep, I have just named the variables BAD/OK to point out what causes the problem.
I have added an explanation in a post below.

I agree that I can always align the data "manually", but in my opinion - if I ask the compiler to do so it should make 
the job. If the compiler does some assumptions about OS thread's stack alignment, and these assumption are wrong - we 
should fix compiler (to teach it how to workaround) or fix the OS (to make this assumption true).
Re: thread stack alignment on x86  
> I agree that I can always align the data "manually", but in my opinion - if I 
> ask the compiler to do so it should make the job. If the compiler does some 
> assumptions about OS thread's stack alignment, and these assumption are wrong 

I mean you can easily fix OS thread's stack alignment on entry to the thread procedure. See attachment for an example.

P.S. Thanks for a note, by the way, - I'll all that for myself too. ;)
Attachment: Text aligntest.cc 815 bytes
Re: thread stack alignment on x86  
I would call this solution method "impressive" rather than "easily" ;-)

I guess nobody expect to solve this in the assembly way. Not to say, that it is x86-specific.

I believe that something has to be fixed somewhere. It blocks already developed code (i.e. eigen matrix library) from 
porting to QNX.
Re: thread stack alignment on x86  
Well, sure it needs to be fixed in OS.

However you can use this as an workaround as nobody knows when/if that is goung to be fixed and everybody knows that 
older OS releases will not be patched. ;)

As for x86 specific... Hm... Does not your topic subject contain "x86"? ;)
Re: thread stack alignment on x86  
The example posted needs a bit of correction.
Here is the final code

static void *thread_proc_stackmisaligned(void *param)
{
	return ( {
		register void *res;
		register void *temp;
		__asm (
			"push	%%ebp;"
			"movl	%%esp, %1;"
			"movl	%%esp, %%ebp;"
			"subl	$4, %1;"
			"andl	$0xfffffff0, %1;"
			"movl	%1, %%esp;"
			"movl	%2, (%1);"
			"call	thread_proc;"
			"leave;"
			: "=a" (res), "=&r" (temp)
			: "r" (param)
			);
		res;
	});
}
Re: thread stack alignment on x86  
I have another results (?):

# ./a.out 
OK address: 8047be0 // called from main()
BAD address: 8047ba0 // test_align called from main()
BAD address: 7fc6fa4 // test_align called from separate thread

The problem is that the last variable is aligned to 4 (the address ends with 4) but should be alligned to 16 (address 
ends with 0).

In a meantime I have recompiled procnto-smp with STACK_ALIGN 16,
but it does not help (?). Any ideas?