Chris Chiesa
|
Re: pthread_rwlock_unlock( ) return-value discrepancy?
|
Chris Chiesa
03/19/2010 10:49 AM
post50021
|
Re: pthread_rwlock_unlock( ) return-value discrepancy?
> Can you post a simple test case?
Not really; I was trying to write a simple program to demonstrate it -- but it turns out that outside of my company's
(SConstruct.py-based) construction scheme I can't even get 'hello world' to run. (It compiles, but at run time I get a
"syntax error:'(' unexpected" error message. That's a topic for a whole other post, someday when I have time.)
Chris
|
|
|
Chris Chiesa
|
Re: pthread_rwlock_unlock( ) return-value discrepancy?
|
Chris Chiesa
03/22/2010 2:30 PM
post50158
|
Re: pthread_rwlock_unlock( ) return-value discrepancy?
I have written several versions of a test-case program. It exhibits
precisely the behavior I would expect, under non-locking,
non-blocking-locking, and blocking-locking conditions. It
specifically fails to exhibit the precise behavior that (I think) I'm
seeing in my production code. However, it has made me aware that
non-blocking locking requires a somewhat different thought process
than the more familiar blocking kind, and I am now struggling with
coming up with the right way to use non-blocking locking. I can't
afford to have my production code block here, but it seems the only
alternative is to massively restructure my code to continuously retry
operations that are buried deep in a resource manager, where retrying
is awkward at best. On the other hand, the blocking versions of the
functions appear to be very stable so maybe I can get away with using
them anyway. Decisions, decisions. Are there any standard paradigms/
patterns for using the non-blocking (...try_...) locking functions?
If so, I would greatly appreciate hearing about them.
Those not wishing/needing to examine/experiment further can stop
reading here; I thank you for your time. For those who do wish to go
deeper, I have attached a zip archive containing several relevant
files, as follows.
File tryit.cpp is, of course, the program source code. As far as I
can tell, it should build on anybody's QNX platform (FWIW, I'm using
6.3.2). Program history and output are as follows.
The first version of the program had no locking, and exhibited race
conditions: some of the output lines showed the string containing a
run of one character followed by a run of another; i.e. the reader had
caught the writer in the act of changing the string's contents. See
thread_interlace.txt. This is exactly as I expected (and it took some
time to cobble up a situation in which the effects of not locking
would be easily visible).
The second version of the program used the non-blocking (...try...)
versions of the locking functions. See file radio_try_lockfails.txt.
It appears that the writer and first three readers started up
more-or-less immediately, and ran without locking problems but also
with the readers producing no output, i.e. perhaps not really running,
perhaps starved by the writer, though it's not clear how this can
occur. Startup of the fourth reader is reported quite belatedly, and
locking problems begin immediately after that startup is announced.
The locking problem is specifically that the first reader tries to
lock while the writer holds the lock; this is as I would expect, and
is in more-or-less the same "category" as the problem I see in my
production code -- but is DIFFERENT FROM the behavior of my production
code, in which problems begin when the WRITER *UN*LOCKS the lock that
IT ITSELF allegedly holds. (I say "allegedly" because it is
impossible to know for sure without doing your own bookkeeping which
requires making assumptions about the state of the underlying
lock. (If only struct pthread_wrlock_t weren't opaque, I could examine
the lock itself under various conditions and see what IT (i.e. the
locking facility) thinks is going on, and probably solve all of this
quite quickly. Many years of experiences like this one have given me
an abiding dislike for opaque data structures.) Several of the
behaviors of this program are difficult to explain: 1) The belated
indication of Thread 4 startup, the lack of output prior to that
point, and the coincidence of that thread startup with the beginning
of locking problems, suggests that the readers are not fully started
until after the writer has been running for some time -- which is very
odd since the readers are started BEFORE the writer. Well, maybe
starvation is occurring. 2) Even though "locking is failing," I see
no evidence of the race condition that appeared in the no-locking
version of the program. Conversely, note that while the...
View Full Message
|
|
|