Lewis Donzis
|
thread unsafe-ness in libc
|
Lewis Donzis
10/14/2010 6:52 PM
post70770
|
thread unsafe-ness in libc
The slog functions don't appear to be completely thread-safe. We just noticed this on 6.3.2, but looking through the 6.
5.0 source, it appears to have the same problem.
Specifically, once execution gets to slogsend(), it locks the slog mutex, does a MsgSendv() (possibly reconnecting if
necessary), and then unlocks the mutex.
However, if the thread is canceled at MsgSendv(), then the thread exits without unlocking the mutex, leading to a
deadlock by other threads trying to use slog.
Seems like the function should push a cleanup function to unlock the mutex.
This raises a more general question about the library. We presume that a "thread safe" function includes that canceling
the thread should be well-behaved. Is this a reasonable expectation?
Thanks,
lew
|
|
|
Lewis Donzis
|
Re: thread unsafe-ness in libc
|
Lewis Donzis
12/22/2010 8:06 AM
post79734
|
Re: thread unsafe-ness in libc
It's been over two months, I was hoping to see a reply by now...
|
|
|
Jerry Sui
|
Re: thread unsafe-ness in libc
|
Jerry Sui
12/22/2010 9:36 AM
post79984
|
Re: thread unsafe-ness in libc
The thread should not be cancelled or terminated when it is holding a mutex. Actually you should always avoid to
terminate a thread during the thread is running, because the thread will not have any chance to do necessary cleanups.
Let the thread exit itself when some conditions are meet.
Jerry
> The slog functions don't appear to be completely thread-safe. We just noticed
> this on 6.3.2, but looking through the 6.5.0 source, it appears to have the
> same problem.
>
> Specifically, once execution gets to slogsend(), it locks the slog mutex, does
> a MsgSendv() (possibly reconnecting if necessary), and then unlocks the mutex
> .
>
> However, if the thread is canceled at MsgSendv(), then the thread exits
> without unlocking the mutex, leading to a deadlock by other threads trying to
> use slog.
>
> Seems like the function should push a cleanup function to unlock the mutex.
>
> This raises a more general question about the library. We presume that a "
> thread safe" function includes that canceling the thread should be well-
> behaved. Is this a reasonable expectation?
>
> Thanks,
> lew
|
|
|
Lewis Donzis
|
Re: thread unsafe-ness in libc
|
Lewis Donzis
12/23/2010 12:11 AM
post80165
|
Re: thread unsafe-ness in libc
I presume you don't work for QNX so I'll try to explain...
There is absolutely nothing wrong with canceling a thread, and there are functions provided that allow a thread to clean
up and exit completely gracefully after being canceled. This is the entire point of the pthread_cleanup_push() and
pthread_cleanup_pop() functions.
Thread-safe code that locks a mutex and then enters a cancellation point must push an unlock function so that if the
thread is canceled, the mutex is automatically unlocked. In fact, the documentation example of pthread_cleanup_push()
specifically demonstrates the case where a mutex unlock function is pushed to be called in case the thread is canceled.
Check out http://www.qnx.com/developers/docs/6.5.0/topic/com.qnx.doc.neutrino_lib_ref/p/pthread_cleanup_push.html
Thread-safe library functions that call cancellation points while a mutex is locked will (correctly) push an unlock
function, but the slog() function does not do this, and, as far as we can tell, it's broken and this is a bug that
should be fixed.
We would really appreciate hearing from someone from QNX on this matter, so that this can be properly corrected.
lew
|
|
|
Oleh Derevenko(deleted)
|
Re: thread unsafe-ness in libc
|
Oleh Derevenko(deleted)
12/23/2010 9:49 AM
post80201
|
Re: thread unsafe-ness in libc
I'm afraid there would be too much places to fix and too much overhead pushing mutex unlock functions each time a mutex
is locked in libc. You better cease using pthread_cancel. This will be much faster and much cheaper for you. :)
|
|
|
Jerry Sui
|
Re: thread unsafe-ness in libc
|
Jerry Sui
12/23/2010 10:33 AM
post80226
|
Re: thread unsafe-ness in libc
Totally agreed. There is too much risk to use pthread_cancel. It is always good to let the tread itself to decide where
is the grace point to exit. This applies to any OS. To me, killing a thread is the last thing to do after you have
exercised all the other possibilities.
BTW, I do work in QNX as a customer support developer.
|
|
|
Lewis Donzis
|
Re: thread unsafe-ness in libc
|
Lewis Donzis
12/23/2010 10:51 AM
post80228
|
Re: thread unsafe-ness in libc
Where do you guys come up with this? libc already has an internal function called pthread_once() which is called from
at least 75 other places. There are already ample provisions in libc for making library functions thread-safe at
cancellation points.
To say "don't use a standard POSIX function" with no understanding of how it is used in existing, well-established and
reliable programs is irresponsible.
So, at this point, I'm not asking whether you think canceling threads is a good idea, I'm pointing out that the slog()
functions are broken. They have a bug which makes them not thread-safe because they can't be canceled.
This is in contrast to DOZENS of other libc functions that already include this capability. Obviously, there was
interest in these other functions to make them thread safe and that they cancel properly. You say it's a bad idea and
it would take work on many functions, and I'm telling you that the work is already done, 99.9% of libc is already
designed to work properly in this case. The only function we've found so far that, presumably erroneously, does not
properly call the protection function is embedded in slog(), as described in the original post.
And note that you don't have to push a mutex unlock any time you lock a mutex, you only have to do it when you call a
function that is a cancellation point while the mutex is locked, and most library functions don't do this.
In any event, it would be helpful to hear from someone who understands this issue, has used pthread_once() and
understands why it's used, and realizes the problem with slogsend().
lew
|
|
|
Lewis Donzis
|
Re: thread unsafe-ness in libc
|
Lewis Donzis
12/23/2010 10:53 AM
post80229
|
Re: thread unsafe-ness in libc
By the way, Jerry, pthread_cancel() does not kill a thread, there is a system call available to do that if desired.
pthread_cancel() signals the thread to run its pushed cancellation routines so that it can always exit cleanly.
|
|
|
Jerry Sui
|
Re: thread unsafe-ness in libc
|
Jerry Sui
12/23/2010 12:15 PM
post80267
|
Re: thread unsafe-ness in libc
Thank you for pointing out the issue. The issue will be logged and considered for future fix. Meantime, I think you have
to use some workaround for the issue. I learned a lot about pthread_cancel through this thread.
Thanks.
|
|
|
Rasmus Nielsen
|
Re: thread unsafe-ness in libc
|
Rasmus Nielsen
10/16/2013 6:43 AM
post105945
|
Re: thread unsafe-ness in libc
I am currently seing a problem with slogf that I guess could be due to bug mentioned in this thread.
My problem is that the slogf function sometimes blocks forever. If I stop the application in the debugger, I can see
that the thread is stuck inside slogf. (see attached callstack)
If the bug has not been fixed, does anyone have a good work around? (Besides not canceling threads at all)
I was thinking about always disabling thread cancellation when slogf is called, like:
pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, &old);
vslogf("some text");
pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, &old);
Would this solve the problem?
|
|
|
Gervais Mulongoy
|
Re: thread unsafe-ness in libc
|
Gervais Mulongoy
10/16/2013 7:42 AM
post105947
|
Re: thread unsafe-ness in libc
What version of the OS are you using?
|
|
|
Neil Schellenberger(deleted)
|
RE: thread unsafe-ness in libc
|
Neil Schellenberger(deleted)
10/16/2013 8:16 AM
post105949
|
RE: thread unsafe-ness in libc
FWIW, cancellation handlers were added 2011-08-19 (post 650, PR-84453).
|
|
|
Lewis Donzis
|
Re: RE: thread unsafe-ness in libc
|
Lewis Donzis
10/16/2013 11:49 AM
post105968
|
Re: RE: thread unsafe-ness in libc
Was that included in SP1?
|
|
|
Gervais Mulongoy
|
Re: thread unsafe-ness in libc
|
Gervais Mulongoy
10/16/2013 12:22 PM
post105977
|
Re: thread unsafe-ness in libc
As far as I can tell SP1 includes a cancellation safe version of slogf.
On 13-10-16 11:49 AM, Lewis Donzis wrote:
> Was that included in SP1?
>
>
>
> _______________________________________________
>
> OSTech
> http://community.qnx.com/sf/go/post105968
> To cancel your subscription to this discussion, please e-mail ostech-core_os-unsubscribe@community.qnx.com
|
|
|
Lewis Donzis
|
Re: thread unsafe-ness in libc
|
Lewis Donzis
10/16/2013 11:48 AM
post105967
|
Re: thread unsafe-ness in libc
That is what we ended up doing, although, generically, it's better to restore the cancellation state to its previous
value. This can be done with a macro or inline function e.g.:
#define slogtSafe(s, fmt, args...)\
do {\
int oldState;\
pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, &oldState);\
slogt(s, fmt , ## args);\
pthread_setcancelstate(oldState, &oldState);\
} while(0)
|
|
|
|