Project Home
Project Home
Documents
Documents
Wiki
Wiki
Discussion Forums
Discussions
Project Information
Project Info
Forum Topic - thread unsafe-ness in libc: (15 Items)
   
thread unsafe-ness in libc  
The slog functions don't appear to be completely thread-safe.  We just noticed this on 6.3.2, but looking through the 6.
5.0 source, it appears to have the same problem.

Specifically, once execution gets to slogsend(), it locks the slog mutex, does a MsgSendv() (possibly reconnecting if 
necessary), and then unlocks the mutex.

However, if the thread is canceled at MsgSendv(), then the thread exits without unlocking the mutex, leading to a 
deadlock by other threads trying to use slog.

Seems like the function should push a cleanup function to unlock the mutex.

This raises a more general question about the library.  We presume that a "thread safe" function includes that canceling
 the thread should be well-behaved.  Is this a reasonable expectation?

Thanks,
lew
Re: thread unsafe-ness in libc  
It's been over two months, I was hoping to see a reply by now...
Re: thread unsafe-ness in libc  
The thread should not be cancelled or terminated when it is holding a mutex. Actually you should always avoid to 
terminate a thread during the thread is running, because the thread will not have any chance to do necessary cleanups. 
Let the thread exit itself when some conditions are meet.

Jerry

> The slog functions don't appear to be completely thread-safe.  We just noticed
>  this on 6.3.2, but looking through the 6.5.0 source, it appears to have the 
> same problem.
> 
> Specifically, once execution gets to slogsend(), it locks the slog mutex, does
>  a MsgSendv() (possibly reconnecting if necessary), and then unlocks the mutex
> .
> 
> However, if the thread is canceled at MsgSendv(), then the thread exits 
> without unlocking the mutex, leading to a deadlock by other threads trying to 
> use slog.
> 
> Seems like the function should push a cleanup function to unlock the mutex.
> 
> This raises a more general question about the library.  We presume that a "
> thread safe" function includes that canceling the thread should be well-
> behaved.  Is this a reasonable expectation?
> 
> Thanks,
> lew


Re: thread unsafe-ness in libc  
I presume you don't work for QNX so I'll try to explain...

There is absolutely nothing wrong with canceling a thread, and there are functions provided that allow a thread to clean
 up and exit completely gracefully after being canceled.  This is the entire point of the pthread_cleanup_push() and 
pthread_cleanup_pop() functions.  

Thread-safe code that locks a mutex and then enters a cancellation point must push an unlock function so that if the 
thread is canceled, the mutex is automatically unlocked.  In fact, the documentation example of pthread_cleanup_push() 
specifically demonstrates the case where a mutex unlock function is pushed to be called in case the thread is canceled. 
 Check out http://www.qnx.com/developers/docs/6.5.0/topic/com.qnx.doc.neutrino_lib_ref/p/pthread_cleanup_push.html

Thread-safe library functions that call cancellation points while a mutex is locked will (correctly) push an unlock 
function, but the slog() function does not do this, and, as far as we can tell, it's broken and this is a bug that 
should be fixed.

We would really appreciate hearing from someone from QNX on this matter, so that this can be properly corrected.

lew
Re: thread unsafe-ness in libc  
I'm afraid there would be too much places to fix and too much overhead pushing mutex unlock functions each time a mutex 
is locked in libc. You better cease using pthread_cancel. This will be much faster and much cheaper for you. :)
Re: thread unsafe-ness in libc  
Totally agreed. There is too much risk to use pthread_cancel. It is always good to let the tread itself to decide where 
is the grace point to exit. This applies to any OS. To me, killing a thread is the last thing to do after you have 
exercised all the other possibilities.

BTW, I do work in QNX as a customer support developer.
Re: thread unsafe-ness in libc  
Where do you guys come up with this?  libc already has an internal function called pthread_once() which is called from 
at least 75 other places.  There are already ample provisions in libc for making library functions thread-safe at 
cancellation points.

To say "don't use a standard POSIX function" with no understanding of how it is used in existing, well-established and 
reliable programs is irresponsible.

So, at this point, I'm not asking whether you think canceling threads is a good idea, I'm pointing out that the slog() 
functions are broken.  They have a bug which makes them not thread-safe because they can't be canceled.

This is in contrast to DOZENS of other libc functions that already include this capability.  Obviously, there was 
interest in these other functions to make them thread safe and that they cancel properly.  You say it's a bad idea and 
it would take work on many functions, and I'm telling you that the work is already done, 99.9% of libc is already 
designed to work properly in this case.  The only function we've found so far that, presumably erroneously, does not 
properly call the protection function is embedded in slog(), as described in the original post.

And note that you don't have to push a mutex unlock any time you lock a mutex, you only have to do it when you call a 
function that is a cancellation point while the mutex is locked, and most library functions don't do this.

In any event, it would be helpful to hear from someone who understands this issue, has used pthread_once() and 
understands why it's used, and realizes the problem with slogsend().

lew
Re: thread unsafe-ness in libc  
By the way, Jerry, pthread_cancel() does not kill a thread, there is a system call available to do that if desired.  
pthread_cancel() signals the thread to run its pushed cancellation routines so that it can always exit cleanly.
Re: thread unsafe-ness in libc  
Thank you for pointing out the issue. The issue will be logged and considered for future fix. Meantime, I think you have
 to use some workaround for the issue. I learned a lot about pthread_cancel through this thread.

Thanks.
Re: thread unsafe-ness in libc  
I am currently seing a problem with slogf that I guess could be due to bug mentioned in this thread.
My problem is that the slogf function sometimes blocks forever. If I stop the application in the debugger, I can see 
that the thread is stuck inside slogf. (see attached callstack)

If the bug has not been fixed, does anyone have a good work around? (Besides not canceling threads at all)

I was thinking about always disabling thread cancellation when slogf is called, like:
pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, &old);
vslogf("some text");
pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, &old);

Would this solve the problem? 

 
Attachment: Image bug.jpg 12.57 KB
Re: thread unsafe-ness in libc  
What version of the OS are you using?
RE: thread unsafe-ness in libc  
FWIW, cancellation handlers were added 2011-08-19 (post 650, PR-84453).
Re: RE: thread unsafe-ness in libc  
Was that included in SP1?
Re: thread unsafe-ness in libc  
As far as I can tell SP1 includes a cancellation safe version of slogf.

On 13-10-16 11:49 AM, Lewis Donzis wrote:
> Was that included in SP1?
>
>
>
> _______________________________________________
>
> OSTech
> http://community.qnx.com/sf/go/post105968
> To cancel your subscription to this discussion, please e-mail ostech-core_os-unsubscribe@community.qnx.com

Re: thread unsafe-ness in libc  
That is what we ended up doing, although, generically, it's better to restore the cancellation state to its previous 
value.  This can be done with a macro or inline function e.g.:

#define slogtSafe(s, fmt, args...)\
do {\
   int oldState;\
   pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, &oldState);\
   slogt(s, fmt , ## args);\
   pthread_setcancelstate(oldState, &oldState);\
   } while(0)