Forum Topic - qcc using emulated tls rather than %gs:
   
qcc using emulated tls rather than %gs  
I am starting a port of a large multi-threaded application from smp linux to smp qnx 6.4.1. It uses the gcc __thread 
keyword for thread local variables in performance-critical code. Under gcc 3.4.4 on linux, this source snippet:

__thread int _local_number;

int getLocalNumber(void)
{
  return _local_number;
}

generates this assembly:

083202b4 <getLocalNumber>:
 83202b4: 55                    push   %ebp
 83202b5: 89 e5                 mov    %esp,%ebp
 83202b7: a1 14 81 38 08        mov    0x8388114,%eax
 83202bc: 65 8b 00              mov    %gs:(%eax),%eax
 83202bf: 5d                    pop    %ebp
 83202c0: c3                    ret

Note the use of the %gs segment register to access tls variables.

Using qcc v4.3.3 for qnx 6.4.1 generates this assembly:

0834ee10 <getLocalNumber>:
 834ee10: 55                    push   %ebp
 834ee11: 89 e5                 mov    %esp,%ebp
 834ee13: 83 ec 08              sub    $0x8,%esp
 834ee16: c7 04 24 c0 6f 40 08  movl   $0x8406fc0,(%esp)
 834ee1d: e8 0a 50 05 00        call   83a3e2c <__emutls_get_address>
 834ee22: 8b 00                 mov    (%eax),%eax
 834ee24: c9                    leave
 834ee25: c3                    ret

The __emutls_get_address function appears to use the posix pthread_getspecific function to obtain thread local data. 

The performance of emulated tls is significantly slower than using %gs. My questions are:

1. Is there any option to qcc (and corresponding support in the runtime) that provides for tls support via the %gs 
register?

2. Is there any plan to update qcc and qnx to support %gs for tls in the future?

3. If I allocate my own per-thread local variable data structs and point %gs to them, can I expect %gs to be maintained 
across context switches of my application threads?