Colin Burgess(deleted)
10/05/2009 9:46 AM
post39368
|
In the log output you sent there are no instances of KER_EXIT:MSG_RECEIVEV that had (info->flags & _NTO_MI_UNBLOCK_REQ) == _NTO_MI_UNBLOCK_REQ
I see _NTO_MI_ENDIAN_BIG, and _NTO_MI_NET_CRED_DIRTY, and 0
Can you send me the original log and point out the event number where you see a problem?
Oleh Derevenko wrote:
> http://community.qnx.
com/sf/discussion/do/listPosts/projects.core_os/discussion.newcode.topc9212
>
> Guys, I've been keeping silence for quite a long time but the issue is not solved and it's definitely an issue in the
QNX kernel.
> I know it's hard to believe but if you like I can capture video and upload it to youtube for you to see yourselves. In
GDB I just create a file wrapper object with calling `new' and with next step I call a method that invokes open().
That's all. And that open() call returns EINTR immeduiately because the server has unblock pending flag initially set in
msginfo for a new OCB and it aborts the open request because of that.
> This only happens when client/server are on different nodes and this never happens on the same one.
>
>
>
> _______________________________________________
>
> OSTech
> http://community.qnx.com/sf/go/post39347
>
--
cburgess@qnx.com
|
|
|
Oleh Derevenko(deleted)
04/06/2010 10:04 AM
post51203
|
Hi,
> http://community.qnx.com/sf/discussion/do/listPosts/projects.core_os/
> discussion.newcode.topc9212
>
> Guys, I've been keeping silence for quite a long time but the issue is not
> solved and it's definitely an issue in the QNX kernel.
> I know it's hard to believe but if you like I can capture video and upload it
> to youtube for you to see yourselves. In GDB I just create a file wrapper
> object with calling `new' and with next step I call a method that invokes open
> (). That's all. And that open() call returns EINTR immeduiately because the
> server has unblock pending flag initially set in msginfo for a new OCB and it
> aborts the open request because of that.
> This only happens when client/server are on different nodes and this never
> happens on the same one.
This is not exactly the same issue but it seems to me it's somehow related. At least it's reproduced with similar
actions.
This time the kernel seems to lock up threads in network requests in case if request is unblocked from read/write
operations with a signal and then the file is closed right after that.
Roughly it is like in the following example
------------
Worker thread
------------
{
...
read(fd);
...
}
------------
Aborting thread
------------
{
pthread_kill(worker_thread_tid, SIGINT);
close(fd);
}
This approach makes the kernel to lock up threads really soon.
I've discovered that waiting for worker thread to exit from request helps to work the problem around (at least at the
first glance). That is, if I do like this, the problem seems to go away.
------------
Worker thread
------------
{
...
pthread_mutex_lock(&request_mutex);
read(fd);
pthread_mutex_unlock(&request_mutex);
...
}
------------
Aborting thread
------------
{
pthread_kill(worker_thread_tid, SIGINT);
// Wait until request thread exits from read()
pthread_mutex_lock(&request_mutex);
pthread_mutex_unlock(&request_mutex);
close(fd);
}
This makes me think that the kernel may not separate unblock pulse from close request correctly and somehow handle close
request before unblock (in wrong order) or leave some state not cleaned up properly after unblock pulse is discarded by
close. This is, sure, a pure guess but I thought I'll better let you know.
Also, as I've already mentioned at the very beginning, waiting for worker thread to exit from request does not solve
original problem of this forum thread. So these might be related but are not exactly the same.
QNX 6.3.0SP3 x86 with patch 630SP2-0284
|
|
|