Project Home
Project Home
Wiki
Wiki
Discussion Forums
Discussions
Project Information
Project Info
Forum Topic - io-net CPU hog: (9 Items)
   
io-net CPU hog  
Hi all,

We seem to be running into the io-net consuming-all-CPU problem which is referenced in the following forum post.

http://community.qnx.com/sf/discussion/do/listPosts/projects.networking/discussion.technology.topc3292?_pagenum=10

We are running 6.3.2A with multiple devn-i82544 NICs.

Is anyone able to confirm the following questions urgently?
- Can you confirm this bug is present in 6.3.2A?
- Is there a patch or other fix available for 6.3.2A or do we have to upgrade to 6.4.1 to get the fix?
- In which of the following cases (specifically) does the bug occur?
A - Calling select() with a NULL timeval struct for timeout?
B - Calling select() with a timeval struct which is filled with all zeroes?
C - Calling ionotify() with _NOTIFY_ACTION_POLL?

Unfortunately the thread quoted above doesn't seem to answer these questions.

I hope someone is able to answer these questions as it is killing us at a major customer site at the moment.

Thanks in advance,

Rob Rutherford
Ruzz TV
Associations:
post38453:
              io-net CPU hog - Sorry posted in wrong forum - Robert Rutherford(deleted)
            
Re: io-net CPU hog  
I had a problem under 6.3.2 where a select() on read returned but the read() itself returned 0 bytes.   I then repeated 
the cycle and chewed up all available CPU cycles.  This was on a socket.  It turned out after much digging into io-net 
that the socket had sort of closed but was not being reported as an error.   

A read() of 0 bytes on a socket after a select() is 'not a good thing' so I now treat it as an error and close the 
socket.    If you don't want to close() the socket until io-net REALLY reports it, at least stick a delay() in for the 0
 bytes read() case.

Re: io-net CPU hog  
Hi Warren,

Thanks for the thought but I don't believe this is what is happening in this case. 

> 
> A read() of 0 bytes on a socket after a select() is 'not a good thing' so I 
> now treat it as an error and close the socket.    If you don't want to close()
>  the socket until io-net REALLY reports it, at least stick a delay() in for 
> the 0 bytes read() case.
> 

I think it is off-topic but nevertheless I am a bit perplexed by this comment. A select() on a socket showing that there
 is something to read, but then the subsequent read() returning 0 bytes, is AFAIK the (only) official way to be informed
 that the socket has been closed by the remote end.  I'm not sure by what you mean by "until io-net REALLY reports it" -
 how else can it report it other than via the 0-byte read???

Rob R
Re: io-net CPU hog  
If the other end of the socket had been    close()ed    we got an errno of EBADF as documented but if the other end just
 faulted out, that is when the read()=0 occurs.   It used to happen very frequently with intermittent networks or 
contention between different NICs.   
Re: io-net CPU hog  
On Tue, Sep 22, 2009 at 04:52:41AM -0400, Warren Deitch wrote:
> I had a problem under 6.3.2 where a select() on read returned but the read() itself returned 0 bytes.   I then 
repeated the cycle and chewed up all available CPU cycles.  This was on a socket.  It turned out after much digging into
 io-net that the socket had sort of closed but was not being reported as an error.   
> 
> A read() of 0 bytes on a socket after a select() is 'not a good thing' so I now treat it as an error and close the 
socket.    If you don't want to close() the socket until io-net REALLY reports it, at least stick a delay() in for the 0
 bytes read() case.

This sounds like correct behaviour:

http://www.unixguide.net/network/socketfaq/2.13.shtml

-seanb
Re: io-net CPU hog  
On Tue, Sep 22, 2009 at 04:14:55AM -0400, Robert Rutherford wrote:
> Hi all,
> 
> We seem to be running into the io-net consuming-all-CPU problem which is referenced in the following forum post.
> 
> http://community.qnx.com/sf/discussion/do/listPosts/projects.networking/discussion.technology.topc3292?_pagenum=10
> 
> We are running 6.3.2A with multiple devn-i82544 NICs.
> 
> Is anyone able to confirm the following questions urgently?
> - Can you confirm this bug is present in 6.3.2A?

I believe so.  It was fixed in Feb 2007 which I believe
is after 6.3.2....

> - Is there a patch or other fix available for 6.3.2A or do we have to upgrade to 6.4.1 to get the fix?

I don't think so but I'm not sure.  6.4.0 will have it, or
you can statically link your app against a fixed select().

> - In which of the following cases (specifically) does the bug occur?
> A - Calling select() with a NULL timeval struct for timeout?
> B - Calling select() with a timeval struct which is filled with all zeroes?

This one (B).  Or poll() with a zero timeout

> C - Calling ionotify() with _NOTIFY_ACTION_POLL?
> 
> Unfortunately the thread quoted above doesn't seem to answer these questions.
> 
> I hope someone is able to answer these questions as it is killing us at a major customer site at the moment.
> 
> Thanks in advance,
> 
> Rob Rutherford
> Ruzz TV
> 
> 
> 
> 
> _______________________________________________
> 
> General
> http://community.qnx.com/sf/go/post38454
> 
Re: io-net CPU hog  
Thanks Sean, we really appreciate the quick reply.

> 
> I don't think so but I'm not sure.  6.4.0 will have it, or
> you can statically link your app against a fixed select().
> 

We will look at going down this path.

If we just take the latest select() implementation from the main trunk, that should work against 6.3.2, right? 

Rob R
Re: io-net CPU hog  
On Tue, Sep 22, 2009 at 09:17:43AM -0400, Robert Rutherford wrote:
> Thanks Sean, we really appreciate the quick reply.
> 
> > 
> > I don't think so but I'm not sure.  6.4.0 will have it, or
> > you can statically link your app against a fixed select().
> > 
> 
> We will look at going down this path.
> 
> If we just take the latest select() implementation from the main trunk, that should work against 6.3.2, right? 
> 

It _should_.  The file in question would be lib/c/xopen/poll.c
(contains both select() and poll()).

-seanb
Re: io-net CPU hog  
> On Tue, Sep 22, 2009 at 09:17:43AM -0400, Robert Rutherford wrote:
> > 
> > If we just take the latest select() implementation from the main trunk, that
>  should work against 6.3.2, right? 
> > 
> 
> It _should_.  The file in question would be lib/c/xopen/poll.c
> (contains both select() and poll()).


OK compiles, links and passes smoke test :-)

I can't test under load until tomorrow our time when we can go back on-site to the customer.

Of course this doesn't help with third-party utilities like the mysql command-line utility which directly links against 
libc.so. Unfortunately I can't see any fix for that other than a patched libc.so, but that shouldn't be such a big deal 
since we will cover 99% of instances with our own code/apps.

Rob R