Project Home
Project Home
Documents
Documents
Wiki
Wiki
Discussion Forums
Discussions
Project Information
Project Info
Forum Topic - Robust Mutexes under QNX 7.0: (14 Items)
   
Robust Mutexes under QNX 7.0  
I have a question about how robust mutex's work under QNX 7.0.

I create my mutex as (lots of error checking omitted for simplicity):

pthread_mutex_t mMutex; 
pthread_mutexattr_t mMutexAttr;   

// Robust and recursive
pthread_mutexattr_init(&mMutexAttr);
pthread_mutexattr_setrobust(&mMutexAttr, PTHREAD_MUTEX_ROBUST)
pthread_mutexattr_setrecursive(&mMutexAttr, PTHREAD_RECURSIVE_ENABLE)
// Initialize
pthread_mutex_init(&mMutex, &mMutexAttr);

In my lock() routine my code is:

// Acquire the mutex
int retVal = pthread_mutex_lock(&mMutex);
	
if (retVal != EOK)
{
    if (retVal == EOWNERDEAD)
    {
        pthread_mutex_consistent(&mMutex);
        printf("Mutex::lock() - Owner died holding mutex. Fixing state\n");
    }
    else
    {
        printf("Mutex::lock() - Error %s while trying to lock mutex! Mutex not locked!", strerror(retVal));

    }
}
else
{
    // Locked do some processing
}

Now I'm in a multi-threaded application (not multi-process) and I see the following behavior:

1) Thread A locks the mutex
Thread B blocks when trying to lock.
Kill Thread A.
Thread B never acquires the mutex (I thought the robustness would wake B and let it acquire the Mutex with the 
EOWNERDEAD return code so I could cleanup and continue) and I'm deadlocked.

2) Thread A locks the mutex
Kill Thread A
Thread B tries to lock. It gets return code of EINVAL not EOWNERDEAD? Once again my process is effectively broken.

So are robust mutexes implemented under QNX 7 and if so am I doing something wrong code wise to make them work? Or do I 
have to use the SyncMutexEvent() methodology instead (is there sample code)? My preference is Posix over QNX specific 
code for portability reasons.

TIA,

Tim
Re: Robust Mutexes under QNX 7.0  
Robust mutexes work across processes, not threads. It is only when a
process dies that the wakeup mechanism kicks in.
Robust mutexes are not supposed to recover from errors in your code
(which would be the inter-thread case - why do you have a thread exit
while holding a mutex?) but from situations you have less control over,
such as another process going down.

--Elad

On Fri, 2018-02-16 at 15:18 -0500, Tim Sowden(deleted) wrote:
> I have a question about how robust mutex's work under QNX 7.0.
> 
> I create my mutex as (lots of error checking omitted for simplicity):
> 
> pthread_mutex_t mMutex; 
> pthread_mutexattr_t mMutexAttr;   
> 
> // Robust and recursive
> pthread_mutexattr_init(&mMutexAttr);
> pthread_mutexattr_setrobust(&mMutexAttr, PTHREAD_MUTEX_ROBUST)
> pthread_mutexattr_setrecursive(&mMutexAttr, PTHREAD_RECURSIVE_ENABLE)
> // Initialize
> pthread_mutex_init(&mMutex, &mMutexAttr);
> 
> In my lock() routine my code is:
> 
> // Acquire the mutex
> int retVal = pthread_mutex_lock(&mMutex);
> 	
> if (retVal != EOK)
> {
>     if (retVal == EOWNERDEAD)
>     {
>         pthread_mutex_consistent(&mMutex);
>         printf("Mutex::lock() - Owner died holding mutex. Fixing
> state\n");
>     }
>     else
>     {
>         printf("Mutex::lock() - Error %s while trying to lock mutex!
> Mutex not locked!", strerror(retVal));
> 
>     }
> }
> else
> {
>     // Locked do some processing
> }
> 
> Now I'm in a multi-threaded application (not multi-process) and I see
> the following behavior:
> 
> 1) Thread A locks the mutex
> Thread B blocks when trying to lock.
> Kill Thread A.
> Thread B never acquires the mutex (I thought the robustness would
> wake B and let it acquire the Mutex with the EOWNERDEAD return code
> so I could cleanup and continue) and I'm deadlocked.
> 
> 2) Thread A locks the mutex
> Kill Thread A
> Thread B tries to lock. It gets return code of EINVAL not EOWNERDEAD?
> Once again my process is effectively broken.
> 
> So are robust mutexes implemented under QNX 7 and if so am I doing
> something wrong code wise to make them work? Or do I have to use the
> SyncMutexEvent() methodology instead (is there sample code)? My
> preference is Posix over QNX specific code for portability reasons.
> 
> TIA,
> 
> Tim
> 
> 
> 
> _______________________________________________
> 
> OSTech
> http://community.qnx.com/sf/go/post118589
> To cancel your subscription to this discussion, please e-mail ostech-
> core_os-unsubscribe@community.qnx.com
Re: Robust Mutexes under QNX 7.0  
I had no idea that kind of limitation (process only) is placed on it.

We have a fairly complex system where many objects have internal mutexes to protect their internal state. At the moment 
it means we can't ever terminate a thread for any reason because we can't know if that thread currently owns any mutexes
. So in effect every thread either lives forever or must terminate on it's own in a consistent state.

While I agree in theory all threads should be able to do the above, sometimes that's not possible and some cases it's 
far simpler to terminate and respawn instead of trying to put everything back into a 'prior state' so we are terminating
 threads.

I'll assume spawning as detatched doesn't help. 

Tim

> Robust mutexes work across processes, not threads. It is only when a
> process dies that the wakeup mechanism kicks in.
> Robust mutexes are not supposed to recover from errors in your code
> (which would be the inter-thread case - why do you have a thread exit
> while holding a mutex?) but from situations you have less control over,
> such as another process going down.
> 
> --Elad
Re: Robust Mutexes under QNX 7.0  
Maybe try to use pthread_cancel() to set a thread termination pending, and then use pthread_cleanup_push()/
pthread_cleanup_pop() to register per-thread cleanup routines for explicitly freeing resources. Of course, if you are 
using third-party library code that spawns threads internally that wouldn't help, but it does give you more control over
 your own threads. Placing pthread_testcancel() at strategic points in the code might help, too

I think that the threading model is desigend with the idea in mind that each thread is in itself relatively simple, 
robust and self-contained. Not always possible to put into practice, I know.

Regards,
Albrecht

Re: Robust Mutexes under QNX 7.0  
The question then becomes how many places have to have that cancel code added to them to handle a potential cancel 
request.

What if instead I spawned a dummy process for every Mutex (I can do this easily because all our Mutexes come from a 
Mutex wrapper class we created). The dummy process just wakes up once a second and acquires then releases the mutex to 
validate it's state is good and fixes it if EOWNERDEAD is returned. Would that solve my problem (other than the 
silliness of spawning potentially dozens of these dummy processes).

According to the doc's:
http://www.qnx.com/developers/docs/7.0.0/index.html#com.qnx.doc.neutrino.lib_ref/topic/p/pthread_mutexattr_getrobust.
html

"If the process containing the owning thread of a robust mutex terminates while holding the mutex lock, the next thread 
that acquires the mutex is notified about the termination by the return value EOWNERDEAD from the locking function. If 
the owning thread of a robust mutex terminates while holding the mutex lock, the next thread that acquires the mutex is 
notified about the termination by the return value EOWNERDEAD. "

what I did *should* be working. So it's either a bug or the doc's need to be updated to remove that second sentence 
about threads.

Tim
RE: Robust Mutexes under QNX 7.0  
Robust mutexes cannot work across threads within the same process for the simple reason that the kernel doesn't always 
know that thread A holds mutex B. An uncontended mutex lock operation doesn't go through the kernel, it is a simple 
atomic operation in user mode. Therefore a thread can go away while holding one or more mutexes an the kernel is none 
the wiser.
A thread exiting while holding a mutex lock is an application bug. The OS cannot be expected to fix application bugs. 
And please, whatever you do, don't use pthread_cancel() - that will make things much worse.

--Elad
________________________________________
From: Tim Sowden(deleted) [community-noreply@qnx.com]
Sent: February-19-18 9:52 AM
To: ostech-core_os
Subject: Re: Robust Mutexes under QNX 7.0

The question then becomes how many places have to have that cancel code added to them to handle a potential cancel 
request.

What if instead I spawned a dummy process for every Mutex (I can do this easily because all our Mutexes come from a 
Mutex wrapper class we created). The dummy process just wakes up once a second and acquires then releases the mutex to 
validate it's state is good and fixes it if EOWNERDEAD is returned. Would that solve my problem (other than the 
silliness of spawning potentially dozens of these dummy processes).

According to the doc's:
http://www.qnx.com/developers/docs/7.0.0/index.html#com.qnx.doc.neutrino.lib_ref/topic/p/pthread_mutexattr_getrobust.
html

"If the process containing the owning thread of a robust mutex terminates while holding the mutex lock, the next thread 
that acquires the mutex is notified about the termination by the return value EOWNERDEAD from the locking function. If 
the owning thread of a robust mutex terminates while holding the mutex lock, the next thread that acquires the mutex is 
notified about the termination by the return value EOWNERDEAD. "

what I did *should* be working. So it's either a bug or the doc's need to be updated to remove that second sentence 
about threads.

Tim



_______________________________________________

OSTech
http://community.qnx.com/sf/go/post118595
To cancel your subscription to this discussion, please e-mail ostech-core_os-unsubscribe@community.qnx.com
Re: RE: Robust Mutexes under QNX 7.0  
In that case this line in the Doc's (taken verbatim from POSIX robust mutex definitions) is totally wrong and should be 
removed.

"If  the owning thread of a robust mutex terminates while holding the mutex lock, the next thread that acquires the 
mutex is notified about the termination by the return value EOWNERDEAD. "



After doing some reading I think I've found another way to do what I want. GCC supports throwing C++ exceptions in 
signal handlers so I'm going to raise a signal on the thread I want to cancel and then throw an exception in the signal 
handler. That exception will let me unwind the stack which will release any mutex's (the class they are in is RAII) and 
then the thread can exit in the top level try/catch block.

Tim



> Robust mutexes cannot work across threads within the same process for the 
> simple reason that the kernel doesn't always know that thread A holds mutex B.
>  An uncontended mutex lock operation doesn't go through the kernel, it is a 
> simple atomic operation in user mode. Therefore a thread can go away while 
> holding one or more mutexes an the kernel is none the wiser.
> A thread exiting while holding a mutex lock is an application bug. The OS 
> cannot be expected to fix application bugs. And please, whatever you do, don't
>  use pthread_cancel() - that will make things much worse.
> 
> --Elad
> ________________________________________
Re: RE: Robust Mutexes under QNX 7.0  
Elad, I would be much interested to understand why pthread_cancel() is not a good idea to use.

Thanks,
Albrecht
Re: RE: Robust Mutexes under QNX 7.0  
It's not impossible to use pthread_cancel() correctly, but it is very
hard to do so. You have to account for all of the state that is carried
by the thread, both explicit and implicit, and make sure it all gets
cleaned up. That makes for a very error prone implementation. It's akin
to jumping out of a long function in the middle - possible, but very
tricky.

--Elad

On Tue, 2018-02-20 at 13:26 -0500, Albrecht Uhlmann wrote:
> Elad, I would be much interested to understand why pthread_cancel()
> is not a good idea to use.
> 
> Thanks,
> Albrecht
> 
> 
> 
> _______________________________________________
> 
> OSTech
> http://community.qnx.com/sf/go/post118600
> To cancel your subscription to this discussion, please e-mail ostech-
> core_os-unsubscribe@community.qnx.com
Re: RE: Robust Mutexes under QNX 7.0  
As far as I can tell our implementation is POSIX-compliant. The
relevant part of the spec mandates that robust mutexes allow a thread
to detect an inconsistent state if the owning *process* exits.
Detecting a thread exit while holding a mutex is optional.

http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_mutex
attr_getrobust.html

We could have implemented the optional behaviour, but only at the cost
of making every mutex operation enter the kernel.

I will ask for the documentation to clarify this point.

--Elad

On Tue, 2018-02-20 at 11:59 -0500, Tim Sowden(deleted) wrote:
> In that case this line in the Doc's (taken verbatim from POSIX robust
> mutex definitions) is totally wrong and should be removed.
> 
> "If  the owning thread of a robust mutex terminates while holding the
> mutex lock, the next thread that acquires the mutex is notified about
> the termination by the return value EOWNERDEAD. "
> 
> 
> 
> After doing some reading I think I've found another way to do what I
> want. GCC supports throwing C++ exceptions in signal handlers so I'm
> going to raise a signal on the thread I want to cancel and then throw
> an exception in the signal handler. That exception will let me unwind
> the stack which will release any mutex's (the class they are in is
> RAII) and then the thread can exit in the top level try/catch block.
> 
> Tim
> 
> 
> 
> > 
> > Robust mutexes cannot work across threads within the same process
> > for the 
> > simple reason that the kernel doesn't always know that thread A
> > holds mutex B.
> >  An uncontended mutex lock operation doesn't go through the kernel,
> > it is a 
> > simple atomic operation in user mode. Therefore a thread can go
> > away while 
> > holding one or more mutexes an the kernel is none the wiser.
> > A thread exiting while holding a mutex lock is an application bug.
> > The OS 
> > cannot be expected to fix application bugs. And please, whatever
> > you do, don't
> >  use pthread_cancel() - that will make things much worse.
> > 
> > --Elad
> > ________________________________________
> 
> 
> 
> _______________________________________________
> 
> OSTech
> http://community.qnx.com/sf/go/post118599
> To cancel your subscription to this discussion, please e-mail ostech-
> core_os-unsubscribe@community.qnx.com
Re: Robust Mutexes under QNX 7.0  
Hello Tim,

I had recently another kind of problem with a robust mutex (in ported legacy code). It seems to be a bug, but I don't 
know how to verify that because I don't have a QNX support deal.

So my current recommendation is: do not use PTHREAD_MUTEX_ROBUST on QNX!

My scenario is (attached a sample code):
1. one thread is blocked on pthread_cond_wait()
2. another thread cancels the thread
3. problem: the thread never quits (application blocks forever at pthread_join()).

In this scenario, my solution is to use pthread_cleanup_push() to handle mutex releasing.
Attachment: Text cp1.c 2.13 KB
Re: Robust Mutexes under QNX 7.0  
I replied before seeing the whole discussion thread .. I think this was clarified already. Sorry for noice.
RE: Robust Mutexes under QNX 7.0  
I think that what you observed is a real bug that was fixed for 7.0.1.

--Elad
________________________________________
From: Lauri Kaila [community-noreply@qnx.com]
Sent: February-26-18 3:25 AM
To: ostech-core_os
Subject: Re: Robust Mutexes under QNX 7.0

I replied before seeing the whole discussion thread .. I think this was clarified already. Sorry for noice.



_______________________________________________

OSTech
http://community.qnx.com/sf/go/post118619
To cancel your subscription to this discussion, please e-mail ostech-core_os-unsubscribe@community.qnx.com
Re: RE: Robust Mutexes under QNX 7.0  
Hi Elad,

Thanks for info, I seem to be using 7.0.0.

Lauri

> I think that what you observed is a real bug that was fixed for 7.0.1.
> 
> --Elad
> ________________________________________
> From: Lauri Kaila [community-noreply@qnx.com]
> Sent: February-26-18 3:25 AM
> To: ostech-core_os
> Subject: Re: Robust Mutexes under QNX 7.0
> 
> I replied before seeing the whole discussion thread .. I think this was 
> clarified already. Sorry for noice.
> 
> 
> 
> _______________________________________________
> 
> OSTech
> http://community.qnx.com/sf/go/post118619
> To cancel your subscription to this discussion, please e-mail ostech-core_os-
> unsubscribe@community.qnx.co