Project Home
Project Home
Documents
Documents
Wiki
Wiki
Discussion Forums
Discussions
Project Information
Project Info
Forum Topic - A bug in pthread_mutex_unlock(): (17 Items)
   
A bug in pthread_mutex_unlock()  
Hi All

I would like to let you know about a bug I have discovered after an assertion check triggered in my code.

Consider the following scenario. A parent thread starts a child thread and waits for an event from it. Here is a typical
 simplified pseudocode for this synchronization model.

struct Event
{
  bool bState;
  pthread_condr_t cCondvar;
  pthread_mutex_t mMutex;
};

// Parent thread execution flow
{
  ...
  Event eChildEvent; // Event structure is allocated and initialized on stack
  pthread_create(..., &ChildThread, &eChildEvent, ...); // A child thread is started and a pointer to event is passed as
 a parameter to it

  // Now, wait for an event from the child thread
  pthread_mutex_lock(&eChildEvent.mMutex); // <-- B
  if (!eChildEvent.bState)
  {
    pthread_cond_wait(&eChildEvent.cCondvar, &eChildEvent.mMutex);
  }
  pthread_mutex_unlock(&eChildEvent.mMutex);

  ... // Parent thread continues its execution
}

int ChildThread(Event *peNotifyEvent)
{
  ... // Do something

  // Signal the event to let the parent continue
  pthread_mutex_lock(&peNotifyEvent->mMutex);
  peNotifyEvent->bState = true;
  pthread_cond_broadcast(&peNotifyEvent->cCondvar);
  pthread_mutex_unlock(&peNotifyEvent->mMutex); // <-- A
  // peNotifyEvent is not used any more

  ... // Child thread continues its execution
}

Here, as soon as child thread signals the conditional variable, parent thread continues its execution and it is 
acceptable (and preferable) for it to destroy eChildEvent immediately.

Now, let's take a look at implementation of pthread_mutex_unlock() in context marked as "<-- A" in child thread.

int
(pthread_mutex_unlock)(pthread_mutex_t *mutex) {
	if((mutex->__owner & ~_NTO_SYNC_WAITING) == LIBC_TLS()->__owner) {
		if(--mutex->__count <= 0 || (mutex->__count & _NTO_SYNC_COUNTMASK) == 0) {
			mem_barrier();
			if(__mutex_smp_xchg(&mutex->__owner, 0) & _NTO_SYNC_WAITING || mutex->__count & _NTO_SYNC_PRIOCEILING) {
				return SyncMutexUnlock_r((sync_t *)mutex);
			}
		}
		return EOK;
	}
	return EPERM;
}

Take a closer look at the last conditional operator

if(__mutex_smp_xchg(&mutex->__owner, 0) & _NTO_SYNC_WAITING || mutex->__count & _NTO_SYNC_PRIOCEILING) {...}

Here, as soon as __mutex_smp_xchg() is executed and zero is assigned for the owner field, the parent thread can capture the mutex in point marked as "<-- B", make sure that eChildEvent.bState is already true, unlock the mutex 
and destroy everything. Right after that the stack (just as dynamic memory, if mutex would be allocated in it) can be 
reused and a different value can be assigned to the memory where mutex->__count was located. So, now if execution 
context switches back to child thread it examines mutex->__count field and finds _NTO_SYNC_PRIOCEILING bit signalled 
(which was not the case while the mutex was alive). The child thread enters the kernel with SyncMutexUnlock_r() and gets
 EINVAL from there, since the mutex had been already destroyed.

You must not access any mutex fields after you did "xchg" and moved zero into owner field. If you need anything from 
mutext you must copy it to local variables beforehand.

I've discovered this issue with QNX 6.3.0 SP3, however the source of pthread_mutex_unlock() I've posted here has been 
taken from SVN and it did not change from that time (if it was ever changed at all).

There is also another possible improvement I'd like to point you to.
Have a look at the second conditional operator in the function
if(--mutex->__count <= 0 || (mutex->__count & _NTO_SYNC_COUNTMASK) == 0) {...}
Here the count is first decremented and checked against zero directly and then it is masked with _NTO_SYNC_COUNTMASK and
 checked against zero again. As far as I understand this is done to save one binary AND instruction since in most cases 
there are no additional bits set in count flag and normally first...
View Full Message
Re: A bug in pthread_mutex_unlock()  
I've created PR62921 to track the bug you found in the "__mutex_smp_xchg(&mutex->__owner, 0) " line of 
pthread_mutex_unlock(). Thank you for reporting it.

Your second point:

 > if(--mutex->__count <= 0 || (mutex->__count & _NTO_SYNC_COUNTMASK) == 0) {...}
...
 > if((--mutex->__count & _NTO_SYNC_COUNTMASK) == 0) {...}

Hmm I don't think two statements are exactly equivalent. mutex->__count may already be less than zero -- which 
would be 
a bug. In which hase the your version would exacerbate the bug.

-ad


Oleh Derevenko wrote:
> Hi All
> 
> I would like to let you know about a bug I have discovered after an assertion check triggered in my code.
> 
> Consider the following scenario. A parent thread starts a child thread and waits for an event from it. Here is a 
typical simplified pseudocode for this synchronization model.
> 
> struct Event
> {
>   bool bState;
>   pthread_condr_t cCondvar;
>   pthread_mutex_t mMutex;
> };
> 
> // Parent thread execution flow
> {
>   ...
>   Event eChildEvent; // Event structure is allocated and initialized on stack
>   pthread_create(..., &ChildThread, &eChildEvent, ...); // A child thread is started and a pointer to event is passed 
as a parameter to it
> 
>   // Now, wait for an event from the child thread
>   pthread_mutex_lock(&eChildEvent.mMutex); // <-- B
>   if (!eChildEvent.bState)
>   {
>     pthread_cond_wait(&eChildEvent.cCondvar, &eChildEvent.mMutex);
>   }
>   pthread_mutex_unlock(&eChildEvent.mMutex);
> 
>   ... // Parent thread continues its execution
> }
> 
> int ChildThread(Event *peNotifyEvent)
> {
>   ... // Do something
> 
>   // Signal the event to let the parent continue
>   pthread_mutex_lock(&peNotifyEvent->mMutex);
>   peNotifyEvent->bState = true;
>   pthread_cond_broadcast(&peNotifyEvent->cCondvar);
>   pthread_mutex_unlock(&peNotifyEvent->mMutex); // <-- A
>   // peNotifyEvent is not used any more
> 
>   ... // Child thread continues its execution
> }
> 
> Here, as soon as child thread signals the conditional variable, parent thread continues its execution and it is 
acceptable (and preferable) for it to destroy eChildEvent immediately.
> 
> Now, let's take a look at implementation of pthread_mutex_unlock() in context marked as "<-- A" in child thread.
> 
> int
> (pthread_mutex_unlock)(pthread_mutex_t *mutex) {
> 	if((mutex->__owner & ~_NTO_SYNC_WAITING) == LIBC_TLS()->__owner) {
> 		if(--mutex->__count <= 0 || (mutex->__count & _NTO_SYNC_COUNTMASK) == 0) {
> 			mem_barrier();
> 			if(__mutex_smp_xchg(&mutex->__owner, 0) & _NTO_SYNC_WAITING || mutex->__count & _NTO_SYNC_PRIOCEILING) {
> 				return SyncMutexUnlock_r((sync_t *)mutex);
> 			}
> 		}
> 		return EOK;
> 	}
> 	return EPERM;
> }
> 
> Take a closer look at the last conditional operator
> 
> if(__mutex_smp_xchg(&mutex->__owner, 0) & _NTO_SYNC_WAITING || mutex->__count & _NTO_SYNC_PRIOCEILING) {...}
> 
> Here, as soon as __mutex_smp_xchg() is executed and zero is assigned for the owner field, the parent thread can 
capture the mutex in point marked as "<-- B", make sure that eChildEvent.bState is already true, unlock the mutex and 
destroy everything. Right after that the stack (just as dynamic memory, if mutex would be allocated in it) can be reused
 and a different value can be assigned to the memory where mutex->__count was located. So, now if execution context 
switches back to child thread it examines mutex->__count field and finds _NTO_SYNC_PRIOCEILING bit signalled (which was 
not the case while the mutex was alive). The child thread enters the kernel with SyncMutexUnlock_r() and gets EINVAL 
from there, since the mutex had been already...
View Full Message
Re: A bug in pthread_mutex_unlock()  
> Your second point:
> 
>  > if(--mutex->__count <= 0 || (mutex->__count & _NTO_SYNC_COUNTMASK) == 0) {...}
> ...
>  > if((--mutex->__count & _NTO_SYNC_COUNTMASK) == 0) {...}
> 
> Hmm I don't think two statements are exactly equivalent. mutex->__count may already be less than zero -- which would 
be 
> a bug. In which hase the your version would exacerbate the bug.

Mutex count should never be less than zero as a mutex can not be locked a negative count of times. Here "<=" comparison 
was used merely as a precaution instead of "==". Anyway if mutex lock count is less than zero it means that the mutex 
storage is corrupted or it somehow has been unlocked more times than locked. In these cases the library behavior is 
undefined so, there is not much sense in protecting from them.

Aha... The mutex count field could normally be less than zero if _NTO_SYNC_NONRECURSIVE is set (the value of 
_NTO_SYNC_NONRECURSIVE is 0x80000000). However presence of this flag does not affect handling of _NTO_SYNC_COUNTMASK 
field (the count is still incremented even if _NTO_SYNC_NONRECURSIVE is set) and the field can be checked against zero 
in any case.
Re: A bug in pthread_mutex_unlock()  
> Your second point:
> 
>  > if(--mutex->__count <= 0 || (mutex->__count & _NTO_SYNC_COUNTMASK) == 0) {...}
> ...
>  > if((--mutex->__count & _NTO_SYNC_COUNTMASK) == 0) {...}
> 
> Hmm I don't think two statements are exactly equivalent. mutex->__count may already be less than zero -- which would 
be 
> a bug. In which hase the your version would exacerbate the bug.

It's very strange, but the program does not work normally without first condition. :-[ ]
It does not crash, but does not work as well.

It looks like the kernel gets confused the threads are blocked on condvars while thread state shows they are blocked on 
mutexes.

(gdb) i thr
  4 process 5001338 ()  0xb032e9f9 in SyncCondvarWait_r () from /lib/libc.so.2
  3 process 5001338 ()  0xb032dd29 in MsgReceive () from /lib/libc.so.2
  2 process 5001338 ()  0xb032e9f9 in SyncCondvarWait_r () from /lib/libc.so.2
* 1 process 5001338 ()  0xb032e9f9 in SyncCondvarWait_r () from /lib/libc.so.2
(gdb) she pidin -p 5001338
     pid tid name               prio STATE       Blocked
 5001338   1 ../bin/crm.bin      30o MUTEX       5001338-04 #1
 5001338   2 ../bin/crm.bin      30o MUTEX       5001338-01 #1
 5001338   3 ../bin/crm.bin      10o RECEIVE     2
 5001338   4 ../bin/crm.bin      30o CONDVAR     8116298

I'll try to make some more investigations to this phenomenon. Does anybody have any ideas why the first part is required
?

NB: Sure, I've moved decrement operator into the second part while removing the first one.
Re: A bug in pthread_mutex_unlock()  
I guess I know... It looks like the kernel may not clear the count field for not recursive mutexes when mutex is 
released in pthread_cond_wait()/pthread_cond_timedwait() and leave 1 in it. As a result, when another thread attempts to
 signal the convar, there is 0x80000001 in count before mutex is locked, after lock it becomes 0x80000002 and then, on 
unlock the count is decremented back to 0x80000001. The kernel expects that the libc clears the owner field for the not 
recursive mutex even if count is not zero.

> > Your second point:
> > 
> >  > if(--mutex->__count <= 0 || (mutex->__count & _NTO_SYNC_COUNTMASK) == 0) {...}
> > ...
> >  > if((--mutex->__count & _NTO_SYNC_COUNTMASK) == 0) {...}
> > 
> > Hmm I don't think two statements are exactly equivalent. mutex->__count may 
> already be less than zero -- which would be 
> > a bug. In which hase the your version would exacerbate the bug.
> 
> It's very strange, but the program does not work normally without first 
> condition. :-[ ]
> It does not crash, but does not work as well.
> 
> It looks like the kernel gets confused the threads are blocked on condvars 
> while thread state shows they are blocked on mutexes.
> 
> (gdb) i thr
>   4 process 5001338 ()  0xb032e9f9 in SyncCondvarWait_r () from /lib/libc.so.2
> 
>   3 process 5001338 ()  0xb032dd29 in MsgReceive () from /lib/libc.so.2
>   2 process 5001338 ()  0xb032e9f9 in SyncCondvarWait_r () from /lib/libc.so.2
> 
> * 1 process 5001338 ()  0xb032e9f9 in SyncCondvarWait_r () from /lib/libc.so.2
> 
> (gdb) she pidin -p 5001338
>      pid tid name               prio STATE       Blocked
>  5001338   1 ../bin/crm.bin      30o MUTEX       5001338-04 #1
>  5001338   2 ../bin/crm.bin      30o MUTEX       5001338-01 #1
>  5001338   3 ../bin/crm.bin      10o RECEIVE     2
>  5001338   4 ../bin/crm.bin      30o CONDVAR     8116298
> 
> I'll try to make some more investigations to this phenomenon. Does anybody 
> have any ideas why the first part is required?
> 
> NB: Sure, I've moved decrement operator into the second part while removing 
> the first one.


Re: A bug in pthread_mutex_unlock()  
I've seen a fix has been commited. That's great. :)

One more question.

==========
int
(pthread_mutex_unlock)(pthread_mutex_t *mutex) {
	if((mutex->__owner & ~_NTO_SYNC_WAITING) == LIBC_TLS()->__owner) {
		if(--mutex->__count <= 0 || (mutex->__count & _NTO_SYNC_COUNTMASK) == 0) {
			int prioceiling;
			mem_barrier();
			prioceiling = mutex->__count & _NTO_SYNC_PRIOCEILING;
			if((__mutex_smp_xchg(&mutex->__owner, 0) & _NTO_SYNC_WAITING) || prioceiling) {
				return SyncMutexUnlock_r((sync_t *)mutex);
			}
		}
		return EOK;
	}
	return EPERM;
}
==========
Why is mem_barrier() necessary if following __mutex_smp_xchg() is implicitly a memory barrier?
Re: A bug in pthread_mutex_unlock()  
Allright, let me rephrase it. ;)

Two memory barriers in a row are redundand and unnecessary, aren't they?


> I've seen a fix has been commited. That's great. :)
> 
> One more question.
> 
> ==========
> int
> (pthread_mutex_unlock)(pthread_mutex_t *mutex) {
> 	if((mutex->__owner & ~_NTO_SYNC_WAITING) == LIBC_TLS()->__owner) {
> 		if(--mutex->__count <= 0 || (mutex->__count & _NTO_SYNC_COUNTMASK) == 0) {
> 			int prioceiling;
> 			mem_barrier();
> 			prioceiling = mutex->__count & _NTO_SYNC_PRIOCEILING;
> 			if((__mutex_smp_xchg(&mutex->__owner, 0) & _NTO_SYNC_WAITING) || prioceiling) {
> 				return SyncMutexUnlock_r((sync_t *)mutex);
> 			}
> 		}
> 		return EOK;
> 	}
> 	return EPERM;
> }
> ==========
> Why is mem_barrier() necessary if following __mutex_smp_xchg() is implicitly a
>  memory barrier?
Re: A bug in pthread_mutex_unlock()  
That's sad if nobody understands code well enough to take the responsibility of carrying out a decision. :(

P.S. Greetings from Igor. ;)

> Allright, let me rephrase it. ;)
> 
> Two memory barriers in a row are redundand and unnecessary, aren't they?
> 
> 
> > I've seen a fix has been commited. That's great. :)
> > 
> > One more question.
> > 
> > ==========
> > int
> > (pthread_mutex_unlock)(pthread_mutex_t *mutex) {
> > 	if((mutex->__owner & ~_NTO_SYNC_WAITING) == LIBC_TLS()->__owner) {
> > 		if(--mutex->__count <= 0 || (mutex->__count & _NTO_SYNC_COUNTMASK) == 0) {
> > 			int prioceiling;
> > 			mem_barrier();
> > 			prioceiling = mutex->__count & _NTO_SYNC_PRIOCEILING;
> > 			if((__mutex_smp_xchg(&mutex->__owner, 0) & _NTO_SYNC_WAITING) || prioceiling) {
> > 				return SyncMutexUnlock_r((sync_t *)mutex);
> > 			}
> > 		}
> > 		return EOK;
> > 	}
> > 	return EPERM;
> > }
> > ==========
> > Why is mem_barrier() necessary if following __mutex_smp_xchg() is implicitly
>  a
> >  memory barrier?


Re: A bug in pthread_mutex_unlock()  
I don't belive __mutex_smp_xchg() contains it's own  memory barrier. All ofthe other references to it call some form of 
memory barrier first. 
Re: A bug in pthread_mutex_unlock()  
Hi Attilla

> I don't belive __mutex_smp_xchg() contains it's own  memory barrier. All ofthe
>  other references to it call some form of memory barrier first. 

Let's have a look at your own (QNX) implementation of memory barrier.

#define __cpu_membarrier() ({ extern unsigned __cpu_flags; if(__cpu_flags & (1 << 15)) __asm__ __volatile__ ("mfence"); else 
__asm __volatile__("lock; orb $0,0(%esp)"); })

Now let's have a look at implementation of memory barrier in MSVC 2005

FORCEINLINE
VOID
MemoryBarrier (
    VOID
    )
{
    LONG Barrier;
    __asm {
        xchg Barrier, eax
    }
}

Now let's have a look at implementation of _smp_xchg()

static __inline int __attribute__((__unused__)) _smp_xchg(volatile unsigned *__dst, unsigned __src) {
	__asm__ __volatile__(
		"xchgl %0,%1"
		:"=r" (__src)
		:"m" (__atomic_fool_gcc(__dst)), "0" (__src)
		:"memory");
	return __src;
}

Don't you see any similarities with memory barrier from MSVC?
All the locked memory access instructions (the ones with "lock" prefix) are implicitly the memory barriers. 

Here is an excerpt from IA-32 Intel® Architecture Software Developer’s Manual (pay attention to the last paragraph).
=============
7.2.4. Strengthening or Weakening the Memory Ordering Model
The IA-32 architecture provides several mechanisms for strengthening or weakening the
memory ordering model to handle special programming situations. These mechanisms include:
• The I/O instructions, locking instructions, the LOCK prefix, and serializing instructions
force stronger ordering on the processor.
• The SFENCE instruction (introduced to the IA-32 architecture in the Pentium III
processor) and the LFENCE and MFENCE instructions (introduced in the Pentium 4 and
Intel Xeon processors) provide memory ordering and serialization capability for specific
types of memory operations.
• The memory type range registers (MTRRs) can be used to strengthen or weaken memory
ordering for specific area of physical memory (see Section 10.11., “Memory Type Range
Registers (MTRRs)”). MTRRs are available only in the Pentium 4, Intel Xeon, and P6
family processors.
• The page attribute table (PAT) can be used to strengthen memory ordering for a specific
page or group of pages (see Section 10.12., “Page Attribute Table (PAT)”). The PAT is
available only in the Pentium 4, Intel Xeon, and Pentium III processors.
These mechanisms can be used as follows.
Memory mapped devices and other I/O devices on the bus are often sensitive to the order of
writes to their I/O buffers. I/O instructions can be used to (the IN and OUT instructions) impose
strong write ordering on such accesses as follows. Prior to executing an I/O instruction, the
processor waits for all previous instructions in the program to complete and for all buffered
writes to drain to memory. Only instruction fetch and page tables walks can pass I/O instructions.
Execution of subsequent instructions do not begin until the processor determines that the
I/O instruction has been completed.
Synchronization mechanisms in multiple-processor systems may depend upon a strong
memory-ordering model. Here, a program can use a locking instruction such as the XCHG
instruction or the LOCK prefix to insure that a read-modify-write operation on memory is
carried out atomically. Locking operations typically operate like I/O operations in that they wait
for all previous instructions to complete and for all buffered writes to drain to memory (see
Section 7.1.2., “Bus Locking”).
=============

XCHG with one of arguments in memory always asserts LOCK# signal automatically. Another excerpt follows.
=============
Exchanges the contents of the destination (first) and source (second) operands. The operands
can be two general-purpose registers or a register and a memory location. If a memory operand
is referenced, the processor’s locking protocol is automatically implemented for the duration of
the...
View Full Message
Re: A bug in pthread_mutex_unlock()  
Oleh,

You could well be right, but as you yourself admit, you don't know about other architectures.  As
Neil mentioned, we kinda busy right now. :-)

We're not going to remove a line of code that was explicitly added back in 1999 until we've thought it through,
and right now we have more pressing issues to address.

In the meantime, we will raise PR67271 to track this.  We believe me when we say that we do appreciate your
input.

Cheers,

Colin

Oleh Derevenko wrote:
> Hi Attilla
> 
>> I don't belive __mutex_smp_xchg() contains it's own  memory barrier. All ofthe
>>  other references to it call some form of memory barrier first. 
> 
> Let's have a look at your own (QNX) implementation of memory barrier.
> 
> #define __cpu_membarrier() ({ extern unsigned __cpu_flags; if(__cpu_flags & (1 << 15)) __asm__ __volatile__ ("mfence"); 
else __asm __volatile__("lock; orb $0,0(%esp)"); })
> 
> Now let's have a look at implementation of memory barrier in MSVC 2005
> 
> FORCEINLINE
> VOID
> MemoryBarrier (
>     VOID
>     )
> {
>     LONG Barrier;
>     __asm {
>         xchg Barrier, eax
>     }
> }
> 
> Now let's have a look at implementation of _smp_xchg()
> 
> static __inline int __attribute__((__unused__)) _smp_xchg(volatile unsigned *__dst, unsigned __src) {
> 	__asm__ __volatile__(
> 		"xchgl %0,%1"
> 		:"=r" (__src)
> 		:"m" (__atomic_fool_gcc(__dst)), "0" (__src)
> 		:"memory");
> 	return __src;
> }
> 
> Don't you see any similarities with memory barrier from MSVC?
> All the locked memory access instructions (the ones with "lock" prefix) are implicitly the memory barriers. 
> 
> Here is an excerpt from IA-32 Intel® Architecture Software Developer’s Manual (pay attention to the last paragraph).

> =============
> 7.2.4. Strengthening or Weakening the Memory Ordering Model
> The IA-32 architecture provides several mechanisms for strengthening or weakening the
> memory ordering model to handle special programming situations. These mechanisms include:
> • The I/O instructions, locking instructions, the LOCK prefix, and serializing instructions
> force stronger ordering on the processor.
> • The SFENCE instruction (introduced to the IA-32 architecture in the Pentium III
> processor) and the LFENCE and MFENCE instructions (introduced in the Pentium 4 and
> Intel Xeon processors) provide memory ordering and serialization capability for specific
> types of memory operations.
> • The memory type range registers (MTRRs) can be used to strengthen or weaken memory
> ordering for specific area of physical memory (see Section 10.11., “Memory Type Range
> Registers (MTRRs)”). MTRRs are available only in the Pentium 4, Intel Xeon, and P6
> family processors.
> • The page attribute table (PAT) can be used to strengthen memory ordering for a specific
> page or group of pages (see Section 10.12., “Page Attribute Table (PAT)”). The PAT is
> available only in the Pentium 4, Intel Xeon, and Pentium III processors.
> These mechanisms can be used as follows.
> Memory mapped devices and other I/O devices on the bus are often sensitive to the order of
> writes to their I/O buffers. I/O instructions can be used to (the IN and OUT instructions) impose
> strong write ordering on such accesses as follows. Prior to executing an I/O instruction, the
> processor waits for all previous instructions in the program to complete and for all buffered
> writes to drain to memory. Only instruction fetch and page tables walks can pass I/O instructions.
> Execution of subsequent instructions do not begin until the processor determines that the
> I/O instruction has been completed.
> Synchronization mechanisms in multiple-processor systems may depend upon a strong
> memory-ordering model....
View Full Message
Re: A bug in pthread_mutex_unlock()  
As for other architectures... Even if there is _smp_xchg without a memory barrier in it on any platform (which I hardly 
believe as it makes the feature itself practically useless) that only means that the memory barrier should be added to 
it.

> You could well be right, but as you yourself admit, you don't know about other
>  architectures.  As
> Neil mentioned, we kinda busy right now. :-)
> 
> We're not going to remove a line of code that was explicitly added back in 
> 1999 until we've thought it through,
> and right now we have more pressing issues to address.
> 
> In the meantime, we will raise PR67271 to track this.  We believe me when we 
> say that we do appreciate your
> input.
RE: A bug in pthread_mutex_unlock()  
The asm for it does have a "memory" constraint, but I haven't had a chance to check if that causes the compiler to 
produce a proper SMP memory barrier on all platforms.  (Things are a little busy around here right now ;-)  Even so, I'm
 inclined to leave things as they are until 6.4.1 is out the door.

-----Original Message-----
From: Attilla Danko [mailto:community-noreply@qnx.com]
Sent: Thu 4/9/2009 6:07 PM
To: ostech-core_os
Subject: Re: A bug in pthread_mutex_unlock()
 
I don't belive __mutex_smp_xchg() contains it's own  memory barrier. All ofthe other references to it call some form of 
memory barrier first. 

_______________________________________________
OSTech
http://community.qnx.com/sf/go/post26653


Attachment: Text winmail.dat 2.75 KB
Re: RE: A bug in pthread_mutex_unlock()  
Guys, do not mistake everything. "memory" clobbering constraint has nothing in common with CPU memory barrier.
"memory" constraint tells GCC that it can't cache expressions from memory after crossing the command. That is, normally,
 GCC could reuse value evaluated earler unless it is marked with "volatile" but now it can't do it any more and must re-
read it from memory once again.
Memory barrier for processor is one of instructions (as I mentioned in previous post) that forces the proecssor to wait 
until all pending writes to memory complete and does not allow it to start pre-fetching values for memory access 
instructions that are scheduled for execution after the barrier. That is, the memory barrier ensures that everything 
that has been scheduled for writing to memory if flushed to memory (or to processor cache to extent sufficient to be 
recognized by other processors in the system) before crossing the barrier and none of the next reads start until 
crossing the barrier. And no assembler constraints affect this - just some selected assembler instructions.

> The asm for it does have a "memory" constraint, but I haven't had a chance to 
> check if that causes the compiler to produce a proper SMP memory barrier on 
> all platforms.  (Things are a little busy around here right now ;-)  Even so, 
> I'm inclined to leave things as they are until 6.4.1 is out the door.
> 
> -----Original Message-----
> From: Attilla Danko [mailto:community-noreply@qnx.com]
> Sent: Thu 4/9/2009 6:07 PM
> To: ostech-core_os
> Subject: Re: A bug in pthread_mutex_unlock()
>  
> I don't belive __mutex_smp_xchg() contains it's own  memory barrier. All ofthe
>  other references to it call some form of memory barrier first. 
> 
> _______________________________________________
> OSTech
> http://community.qnx.com/sf/go/post26653
> 
> 


RE: RE: A bug in pthread_mutex_unlock()  
That's what I said: I wasn't certain that the compiler would emit an SMP barrier (and was pretty sure it wouldn't).  
Note that there is no inherent reason that the compiler wouldn't choose to emit such an instruction for a memory 
constraint, and on SMP machines it would make a certain amount of sense for it to do so.  However, as I said, I don't 
think it does.

-----Original Message-----
From: Oleh Derevenko [mailto:community-noreply@qnx.com]
Sent: Fri 4/10/2009 4:58 PM
To: ostech-core_os
Subject: Re: RE: A bug in pthread_mutex_unlock()
 
Guys, do not mistake everything. "memory" clobbering constraint has nothing in common with CPU memory barrier.
"memory" constraint tells GCC that it can't cache expressions from memory after crossing the command. That is, normally,
 GCC could reuse value evaluated earler unless it is marked with "volatile" but now it can't do it any more and must re-
read it from memory once again.
Memory barrier for processor is one of instructions (as I mentioned in previous post) that forces the proecssor to wait 
until all pending writes to memory complete and does not allow it to start pre-fetching values for memory access 
instructions that are scheduled for execution after the barrier. That is, the memory barrier ensures that everything 
that has been scheduled for writing to memory if flushed to memory (or to processor cache to extent sufficient to be 
recognized by other processors in the system) before crossing the barrier and none of the next reads start until 
crossing the barrier. And no assembler constraints affect this - just some selected assembler instructions.

> The asm for it does have a "memory" constraint, but I haven't had a chance to 
> check if that causes the compiler to produce a proper SMP memory barrier on 
> all platforms.  (Things are a little busy around here right now ;-)  Even so, 
> I'm inclined to leave things as they are until 6.4.1 is out the door.
> 
> -----Original Message-----
> From: Attilla Danko [mailto:community-noreply@qnx.com]
> Sent: Thu 4/9/2009 6:07 PM
> To: ostech-core_os
> Subject: Re: A bug in pthread_mutex_unlock()
>  
> I don't belive __mutex_smp_xchg() contains it's own  memory barrier. All ofthe
>  other references to it call some form of memory barrier first. 
> 
> _______________________________________________
> OSTech
> http://community.qnx.com/sf/go/post26653
> 
> 




_______________________________________________
OSTech
http://community.qnx.com/sf/go/post26690


Re: RE: RE: A bug in pthread_mutex_unlock()  
Neil, you do not need compiler to emit any additional instruction. "xchg mem,reg" is the memory barrier itself. It's not
 only "mfence" that counts for the barrier but any of port I/O instructions as well as locked instructions (including 
xchg).

> 
> That's what I said: I wasn't certain that the compiler would emit an SMP 
> barrier (and was pretty sure it wouldn't).  Note that there is no inherent 
> reason that the compiler wouldn't choose to emit such an instruction for a 
> memory constraint, and on SMP machines it would make a certain amount of sense
>  for it to do so.  However, as I said, I don't think it does.
RE: RE: RE: A bug in pthread_mutex_unlock()  
Hi Oleh,

Yes, I'm aware that's the case on the x86.  I was more concerned about the other architectures which we support.

Regards,
Neil


-----Original Message-----
From: Oleh Derevenko [mailto:community-noreply@qnx.com]
Sent: Sat 4/11/2009 3:37 PM
To: ostech-core_os
Subject: Re: RE: RE: A bug in pthread_mutex_unlock()
 
Neil, you do not need compiler to emit any additional instruction. "xchg mem,reg" is the memory barrier itself. It's not
 only "mfence" that counts for the barrier but any of port I/O instructions as well as locked instructions (including 
xchg).

> 
> That's what I said: I wasn't certain that the compiler would emit an SMP 
> barrier (and was pretty sure it wouldn't).  Note that there is no inherent 
> reason that the compiler wouldn't choose to emit such an instruction for a 
> memory constraint, and on SMP machines it would make a certain amount of sense
>  for it to do so.  However, as I said, I don't think it does.


_______________________________________________
OSTech
http://community.qnx.com/sf/go/post26714