Project Home
Project Home
Documents
Documents
Wiki
Wiki
Discussion Forums
Discussions
Project Information
Project Info
Forum Topic - Unblock pulse again: (5 Items)
   
Unblock pulse again  
http://community.qnx.com/sf/discussion/do/listPosts/projects.core_os/discussion.newcode.topc9212

Guys, I've been keeping silence for quite a long time but the issue is not solved and it's definitely an issue in the 
QNX kernel.
I know it's hard to believe but if you like I can capture video and upload it to youtube for you to see yourselves. In 
GDB I just create a file wrapper object with calling `new' and with next step I call a method that invokes open(). 
That's all. And that open() call returns EINTR immeduiately because the server has unblock pending flag initially set in
 msginfo for a new OCB and it aborts the open request because of that.
This only happens when client/server are on different nodes and this never happens on the same one.
Re: Unblock pulse again  
In the log output you sent there are no instances of KER_EXIT:MSG_RECEIVEV that had (info->flags & _NTO_MI_UNBLOCK_REQ) == _NTO_MI_UNBLOCK_REQ

I see _NTO_MI_ENDIAN_BIG, and _NTO_MI_NET_CRED_DIRTY, and 0

Can you send me the original log and point out the event number where you see a problem?

Oleh Derevenko wrote:
> http://community.qnx.
com/sf/discussion/do/listPosts/projects.core_os/discussion.newcode.topc9212
> 
> Guys, I've been keeping silence for quite a long time but the issue is not solved and it's definitely an issue in the 
QNX kernel.
> I know it's hard to believe but if you like I can capture video and upload it to youtube for you to see yourselves. In
 GDB I just create a file wrapper object with calling `new' and with next step I call a method that invokes open(). 
That's all. And that open() call returns EINTR immeduiately because the server has unblock pending flag initially set in
 msginfo for a new OCB and it aborts the open request because of that.
> This only happens when client/server are on different nodes and this never happens on the same one.
> 
> 
> 
> _______________________________________________
> 
> OSTech
> http://community.qnx.com/sf/go/post39347
> 

-- 
cburgess@qnx.com
Re: Unblock pulse again  
Colin, please check your mail.
I've re-created the problem once again and sent the tracelogger files to you.

Thank you for taking a look at this.

> Can you send me the original log and point out the event number where you see 
> a problem?
> 
Re: Unblock pulse again  
Hi Colin,

I've recreated the problem as you asked. I've also added some TraceEvents(). However it's the first time I'm doing 
anything like that. Hope I did everything correctly.
Check your mail for file download link.

Oleh Derevenko
-- ICQ: 36361783


----- Original Message ----- 
Subject: Re: Unblock pulse with multithreaded RM


Yes, I do see MsgReceivev exiting with _NTO_MI_UNBLOCK_REQ set.

The fact that the debugger is attached is making the log mostly silent - can you
do the trace with the debugger not attached?

Also, you could add in some user events to annotate the log
by using the TraceEvent() kernel calls.

Tracking send/receive/reply across qnet is tricky in these kernel logs... :-)

Cheers,

Colin
Re: Unblock pulse again  
Hi,

> http://community.qnx.com/sf/discussion/do/listPosts/projects.core_os/
> discussion.newcode.topc9212
> 
> Guys, I've been keeping silence for quite a long time but the issue is not 
> solved and it's definitely an issue in the QNX kernel.
> I know it's hard to believe but if you like I can capture video and upload it 
> to youtube for you to see yourselves. In GDB I just create a file wrapper 
> object with calling `new' and with next step I call a method that invokes open
> (). That's all. And that open() call returns EINTR immeduiately because the 
> server has unblock pending flag initially set in msginfo for a new OCB and it 
> aborts the open request because of that.
> This only happens when client/server are on different nodes and this never 
> happens on the same one.


This is not exactly the same issue but it seems to me it's somehow related. At least it's reproduced with similar 
actions.
This time the kernel seems to lock up threads in network requests in case if request is unblocked from read/write 
operations with a signal and then the file is closed right after that.
Roughly it is like in the following example
------------
Worker thread
------------
{
  ...
   read(fd);
  ...
}

------------
Aborting thread
------------
{
  pthread_kill(worker_thread_tid, SIGINT);
  close(fd);
}

This approach makes the kernel to lock up threads really soon.
I've discovered that waiting for worker thread to exit from request helps to work the problem around (at least at the 
first glance). That is, if I do like this, the problem seems to go away.
------------
Worker thread
------------
{
  ...
   pthread_mutex_lock(&request_mutex);
   read(fd);
   pthread_mutex_unlock(&request_mutex);
  ...
}

------------
Aborting thread
------------
{
  pthread_kill(worker_thread_tid, SIGINT);
   // Wait until request thread exits from read()
   pthread_mutex_lock(&request_mutex);
   pthread_mutex_unlock(&request_mutex);
  close(fd);
}

This makes me think that the kernel may not separate unblock pulse from close request correctly and somehow handle close
 request before unblock (in wrong order) or leave some state not cleaned up properly after unblock pulse is discarded by
 close. This is, sure, a pure guess but I thought I'll better let you know.
Also, as I've already mentioned at the very beginning, waiting for worker thread to exit from request does not solve 
original problem of this forum thread. So these might be related but are not exactly the same.

QNX 6.3.0SP3 x86 with patch 630SP2-0284