11/04/2011 12:46 PM
Resource manager attribute unlocking with multiple threads out of synch
I've written a multi-thread enabled resource manager that has a bug somewhere and I've found an anomaly in the
io_func_attr structure that I can't explain and it would be great if someone with more knowledge of resource managers
could say whether it was a problem or not.
We need the driver to support multi-thread access so I call io_func_attr_unlock(), and io_func_attr_lock(), in my
io_read and write because I understand that the resource manager will lock the attributes before my io_read (or
io_write) so I must unlock the attributes to enable multi-thread access. The anomaly is that the io_func_attr.lock_count
field suggests that the locking and unlocking is getting out of synch as during a heavy load test (i.e. two reads and
two writes running simltaneously for 30 mins) the lock_count can rise to >12. Even after a successful series of tests
the lock_count can be left at a higher value than I can see possible (e.g. 3). So if I have 2 threads calling the
io_read function and another 2 threads calling the io_write function, it is worrying that the lock_count can ever rise
higher than 4.
The psuedocode (I've attached the actual code) for the io_read is:
1. Check message is valid and extract read request size and pointer to the user buffer to read into from the message
2. Extract driver channel and device context (pDevice) from extended ocb
4. Call my low level driver read(pDevice, channel, pUserBuffer, readsize)
6. Set the return message size and flags
io_write is implemented in a similar manner way.
I also extended the iofunc_ocb_t and iofunc_attr_t structures to include our low level driver device context pointer and
hardware channel number (data link)
The error that I get is that the resource manager returns an incorrect value for the amount of data written. My worry
then is the lock_count problem is causing data corruption within the return messages.
The hardware is a PCI card with three bi-directional data links and the low level driver seems to work reliably when I
test it without using the resource manager.
Any help would be appreciated and in particular any explanation about how lock_count can become out of synch as I always
have one unlock followed by one lock in both the io_read and io_writes.