foundry27 : Post

Forum Topic - pthread_rwlock_unlock( ) return-value discrepancy?: (10 Items)

View: as

Chris Chiesa

03/18/2010 5:26 PM

post49958

pthread_rwlock_unlock( ) return-value discrepancy?

Some code encapsulates pthread_rwlock_trywrlock( ) in one method, and pthread_rwlock_unlock( ) in another, of a 
singleton class/object.  Another class calls these two methods, in lock-then-unlock sequence with some other work in 
between.

The write-lock operation succeeds; pthread_rwlock_trywrlock( ) returns 0 (EOK).  The unlock operation fails; 
pthread_rwlock_unlock( ) returns 1 (EPERM).

I have two problems.

1) It's not clear why pthread_rwlock_unlock() is returning EPERM.  The Library Reference Manual says that this indicates
 that "No thread has a read or write lock on rwl or the calling thread doesn’t have a write lock on rwl," but in this 
case it seems pretty obvious that the calling thread DOES "have a write lock on wrl."  

2) In order to determine what was going on, I had to print the return values as ints; passing the value 1 (EPERM) to 
strerror( ) returns the string "No error," which I couldn't distinguish from 0 (EOK).  Comments in errno.h suggest that 
EPERM should be stringified instead as "Not owner."

Chris

Chris Chiesa

03/19/2010 9:59 AM

post50008

Re: pthread_rwlock_unlock( ) return-value discrepancy?

Following up to myself...

Looks like strerror(EPERM) is okay after all.  I was feeding it the wrong variable, which had a different value.  
Expectations vs. implementation was messed up.

Still don't understand why I'm getting EPERM, though.  I've confirmed that the same lock (the only one in the program, 
but you never know) is being successfully "trywrlock"-ed and unsuccessfully unlocked.

Um, is it possible for pthread_rwlock_trywrlock() to return EOK while _not_ locking the lock?

Chris

Sean Boudreau(deleted)

03/19/2010 10:00 AM

post50009

Re: pthread_rwlock_unlock( ) return-value discrepancy?

On Fri, Mar 19, 2010 at 09:59:23AM -0400, Chris Chiesa wrote:
> Following up to myself...
> 
> Looks like strerror(EPERM) is okay after all.  I was feeding it the wrong variable, which had a different value.  
Expectations vs. implementation was messed up.
> 
> Still don't understand why I'm getting EPERM, though.  I've confirmed that the same lock (the only one in the program,
 but you never know) is being successfully "trywrlock"-ed and unsuccessfully unlocked.
> 
> Um, is it possible for pthread_rwlock_trywrlock() to return EOK while _not_ locking the lock?
> 
> Chris

Can you post a test case?

-seanb

Chris Chiesa

Re: pthread_rwlock_unlock( ) return-value discrepancy?

Chris Chiesa

03/19/2010 10:49 AM

post50021

Re: pthread_rwlock_unlock( ) return-value discrepancy?

> Can you post a simple test case? 

Not really; I was trying to write a simple program to demonstrate it -- but it turns out that outside of my company's 
(SConstruct.py-based) construction scheme I can't even get 'hello world' to run.  (It compiles, but at run time I get a 
"syntax error:'(' unexpected" error message.  That's a topic for a whole other post, someday when I have time.)

Chris

Aleksandar Ristovski(deleted)

03/19/2010 10:57 AM

post50024

Re: pthread_rwlock_unlock( ) return-value discrepancy?

On 19/03/2010 10:49, Chris Chiesa wrote:
>
>> Can you post a simple test case?
>
> Not really; I was trying to write a simple program to demonstrate it -- but it turns out that outside of my company's 
(SConstruct.py-based) construction scheme I can't even get 'hello world' to run.  (It compiles, but at run time I get a 
"syntax error:'(' unexpected" error message.  That's a topic for a whole other post, someday when I have time.)

That looks like you are trying to run a binary built for one 
target architecture on another.

---
Aleksandar

Chris Chiesa

03/19/2010 11:23 AM

post50027

Re: pthread_rwlock_unlock( ) return-value discrepancy?

> That looks like you are trying to run a binary built for one 
> target architecture on another.

Thanks.  I'll look into it.  I'm pretty sure I'm using the right compiler, but maybe I'm missing a crucial command-line 
switch or something.

Chris Chiesa

03/19/2010 4:55 PM

post50056

Re: pthread_rwlock_unlock( ) return-value discrepancy?

> 
> > That looks like you are trying to run a binary built for one 
> > target architecture on another.
> 
> Thanks.  I'll look into it.  I'm pretty sure I'm using the right compiler, but
>  maybe I'm missing a crucial command-line switch or something.
> 

Figured it out.  First had to (correctly) use company SConstruct procedures after all.  Second, FileZilla FTP client in 
AUTO mode, claimed to transmit my binary (from development system to target system) in BINARY mode, but didn't really.  
When I forcibly set FileZilla to BINARY mode, the same binary arrived on the target in a form that executed succesfully.
  Whew.

Chris Chiesa

Re: pthread_rwlock_unlock( ) return-value discrepancy?

Chris Chiesa

03/22/2010 2:30 PM

post50158

Re: pthread_rwlock_unlock( ) return-value discrepancy?

I have written several versions of a test-case program.  It exhibits
precisely the behavior I would expect, under non-locking,
non-blocking-locking, and blocking-locking conditions.  It
specifically fails to exhibit the precise behavior that (I think) I'm
seeing in my production code.  However, it has made me aware that
non-blocking locking requires a somewhat different thought process
than the more familiar blocking kind, and I am now struggling with
coming up with the right way to use non-blocking locking.  I can't
afford to have my production code block here, but it seems the only
alternative is to massively restructure my code to continuously retry
operations that are buried deep in a resource manager, where retrying
is awkward at best.  On the other hand, the blocking versions of the
functions appear to be very stable so maybe I can get away with using
them anyway.  Decisions, decisions.  Are there any standard paradigms/
patterns for using the non-blocking (...try_...) locking functions?
If so, I would greatly appreciate hearing about them.

Those not wishing/needing to examine/experiment further can stop
reading here; I thank you for your time.  For those who do wish to go
deeper, I have attached a zip archive containing several relevant
files, as follows.

File tryit.cpp is, of course, the program source code.  As far as I
can tell, it should build on anybody's QNX platform (FWIW, I'm using
6.3.2).  Program history and output are as follows.

The first version of the program had no locking, and exhibited race
conditions: some of the output lines showed the string containing a
run of one character followed by a run of another; i.e. the reader had
caught the writer in the act of changing the string's contents.  See
thread_interlace.txt.  This is exactly as I expected (and it took some
time to cobble up a situation in which the effects of not locking
would be easily visible).

The second version of the program used the non-blocking (...try...)
versions of the locking functions.  See file radio_try_lockfails.txt.
It appears that the writer and first three readers started up
more-or-less immediately, and ran without locking problems but also
with the readers producing no output, i.e. perhaps not really running,
perhaps starved by the writer, though it's not clear how this can
occur.  Startup of the fourth reader is reported quite belatedly, and
locking problems begin immediately after that startup is announced.
The locking problem is specifically that the first reader tries to
lock while the writer holds the lock; this is as I would expect, and
is in more-or-less the same "category" as the problem I see in my
production code -- but is DIFFERENT FROM the behavior of my production
code, in which problems begin when the WRITER *UN*LOCKS the lock that
IT ITSELF allegedly holds.  (I say "allegedly" because it is
impossible to know for sure without doing your own bookkeeping which
requires making assumptions about the state of the underlying
lock. (If only struct pthread_wrlock_t weren't opaque, I could examine
the lock itself under various conditions and see what IT (i.e. the
locking facility) thinks is going on, and probably solve all of this
quite quickly.  Many years of experiences like this one have given me
an abiding dislike for opaque data structures.)  Several of the
behaviors of this program are difficult to explain: 1) The belated
indication of Thread 4 startup, the lack of output prior to that
point, and the coincidence of that thread startup with the beginning
of locking problems, suggests that the readers are not fully started
until after the writer has been running for some time -- which is very
odd since the readers are started BEFORE the writer.  Well, maybe
starvation is occurring.  2) Even though "locking is failing," I see
no evidence of the race condition that appeared in the no-locking
version of the program.  Conversely, note that while the...

View Full Message

Attachment:

example.zip 9.41 KB

Chris Chiesa

Re: pthread_rwlock_unlock( ) return-value discrepancy?

Chris Chiesa

03/22/2010 3:21 PM

post50166

Re: pthread_rwlock_unlock( ) return-value discrepancy?

I may have nailed it down.  I added a bunch more debugging statements to my production code and I now see it "locking 
once, unlocking twice."  Now have to figure out how/where THAT's happening.  Details irrelevant to this discussion.

Glad to know it's not a problem with the locking facility.  That was really bugging me.

Neil Schellenberger(deleted)

Re: pthread_rwlock_unlock( ) return-value discrepancy?

Neil Schellenberger(deleted)

03/29/2010 2:35 PM

post50741

Re: pthread_rwlock_unlock( ) return-value discrepancy?

On Mon, 2010-03-22 at 14:30 -0400, Chris Chiesa wrote:
> Are there any standard paradigms/patterns for using the non-blocking
> (...try_...) locking functions? If so, I would greatly appreciate
> hearing about them.

The usual advice I give is "Don't" ;-)

As you have noted, it is usually a system design choice and not
something that should be shoehorned in later....

Return

The text you entered is not a valid object ID
More Information
Object IDs begin with an object prefix and end with a number. For example, if you enter
artf2345
the application will jump directly to an artifact with the ID artf2345. Some valid object prefixes are:
artf	for an artifact
doc	for a document
page	for a project page
topc	for a discussion topic
wiki	for a wiki page