Project Home
Project Home
Wiki
Wiki
Discussion Forums
Discussions
Project Information
Project Info
Forum Topic - qnet: periodic_ticks: (7 Items)
   
qnet: periodic_ticks  
Hello networking gurus,

What are the possible repercussions of increasing the periodic_ticks value for qnet-l4_lite?

Here's the situation:  I've got two nodes connected using 100Mbit ethernet over a crossover cable.  One node is 
attempting to read six or seven different measurements from the other node over qnet every 0.1 seconds.   Every once in 
a while, though, the connection seems to stall for a long period of time (>0.5s), with many messages blocked on qnet, 
until eventually it unsticks itself and carries on.

The sloginfo output of the second node shows many messages like:
Jul 24 20:09:00    7    15     0 npm-qnet(L4): l4_tx_timeout(): timeout: nd 1 sc 1 dc 1 ss 12737 tk 19062 ct 19064
Jul 24 20:09:00    7    15     0 npm-qnet(L4): l4_tx_max_pkt_set(): nd 1 slow mode: passed -1 pkts, window 1 pkts

Digging through the documentation and source suggested that the default timeout period is huge (200-400ms, much larger 
than our sample interval and hundreds of times the network latency).  Of course, the only way to decrease the timeout is
 to increase the periodic_ticks, but the documentation notes this will affect the timing of all other qnet operations.

I tried raising the periodic_ticks value to 50 without adjusting any of the other options, and this seemed to solve our 
problem - not only did we no longer see occasional huge latencies, the number of timeouts also seemed to go down.

Nevertheless, it's not clear to me what other impact this change could have. What possible side effects should I be on 
the lookout for?   I can be confident that we're not expecting to run qnet like this over any network with latencies 
more than a couple ms.

Any wisdom would be greatly appreciated,

-Will
RE: qnet: periodic_ticks  
From: Will Miles [mailto:community-noreply@qnx.com] 

> repercussions of increasing the periodic_ticks

Two things:

1) increased CPU consumption by qnet (eg io-net
or io-pkt) - run the top utility to see the difference,
if any, and

2) if you wish to retain the default timeout behaviour,
it might be an idea to go through the use msg for qnet
and scale all the other timeout tick values appropriately
to avoid any unwanted change in behaviour.

For example, I understand you have gone from 200ms
to 50ms for the periodic_ticks, so you should specify
all the tick-based timeouts with values which are
200/50 = 4 times as much as the default number of ticks.

Clear as mud?
 
My apologies for making the default qnet tick a rather 
timid 200ms - I did not want to consume unnecessary CPU 
on slow platforms.  Generally speaking, these timeouts 
only matter when packets are lost, which should be a rare
occurrence on a properly functioning system.

--
aboyd  www.PoweredByQNX.com
Re: RE: qnet: periodic_ticks  
Hi Alan,

Thanks for your reply.  We didn't notice any significant change in CPU utilization, but then our systems are 
comparatively lightly loaded until they hit the graphics code at lower priority levels.  

I understand that just increasing the periodic ticks value will also decrease the other timeouts unless they're 
explicitly increased.    I guess my question is: are there any unexpected gotchas or race conditions if these other 
timeouts are also decreased?  I'd prefer not to have to enumerate /all/ of those timeout options in the startup script 
(and hope that subsequent versions don't add new ones) unless there's trouble ahead.

To be honest, I'll admit I'm a bit surprised I'm seeing the timeouts at all as well - certainly ethernet never claimed 
to be "reliable", though an 8" crossover cable shouldn't offer too much opportunity for things to get lost...

I'm also a bit surprised that the entire qnet link seemed to stop when the glitch happened - there were several 
different threads making transactions to different servers, and they'd all stop and wait on the first error.   Is it 
normal for qnet transactions to be strictly serialized, or is this something to do with the "window=1" messages - a side
 effect of having extremely low latency (ie. transactions would go out and come back faster than 1 system clock tick) in
 the successful packet cases?

And no worries about the defaults.  Actually, I'd speculated the long timeouts were for effectively supporting routed 
networks on qnet-over-ip.

Thanks again,

-Will
RE: RE: qnet: periodic_ticks  
From: Will Miles [mailto:community-noreply@qnx.com] 

> are there any unexpected gotchas or race conditions if these 
> other timeouts are also decreased?

As long as packets aren't lost, no.  If packets are lost,
then all of the other timeouts will occur 10x as fast as
they normally would.  You might be able to get away
with the defaults, but personally I would stick a
zero after them.

> I'm a bit surprised I'm seeing the timeouts at all as well

Shot in the dark: make sure you have turned off your 
PHY probing in the ethernet driver.  This periodic link 
monitoring has caused this sort of problem in the past
(occasional lost packets).

A few years back, I re-wrote as many driver as I could,
making the default to NOT probe the PHY if the link
was up, but ... what driver are you using?  What
version of the os?

Qnet transactions are indeed serialized, because
tt has been empirically shown in the past during
regression tests that the kernel "grouping" mechanism 
cannot be trusted and thus we don't want any forever 
reply-blocked clients after signals are not properly 
handled (sequenced out of order) across the network.

--
aboyd
Re: qnet: periodic_ticks  
On Fri, 25 Jul 2008 14:28:13 -0400 (EDT)
Andrew Boyd <community-noreply@qnx.com> wrote:

> > I'm a bit surprised I'm seeing the timeouts at all as well
> 
> Shot in the dark: make sure you have turned off your 
> PHY probing in the ethernet driver.  This periodic link 
> monitoring has caused this sort of problem in the past
> (occasional lost packets).
> 
> A few years back, I re-wrote as many driver as I could,
> making the default to NOT probe the PHY if the link
> was up, but ... what driver are you using?  What
> version of the os?

We're using a hand-patched 6.3.0SP2 - the driver is devn-speedo dated April 30 2004.  I recall seeing some of the 
traffic on that topic - it's entirely possible it's long since been fixed.  In some sense, though, it's still good to 
know what happens in the event there is packet loss; it might be handy to keep around for artificially generating these 
conditions.

> 
> Qnet transactions are indeed serialized, because
> tt has been empirically shown in the past during
> regression tests that the kernel "grouping" mechanism 
> cannot be trusted and thus we don't want any forever 
> reply-blocked clients after signals are not properly 
> handled (sequenced out of order) across the network.

Interesting..  I guess that puts an upper bound on the total number of transactions second based on the round-trip 
latency.  Something I'll have to keep in mind for the future - our system tends to do a lot of small, parallel 
transactions; bandwidth won't be a problem but waiting for the next thread's packet to come back might be.

Thanks again for your help!

-Will 
RE: qnet: periodic_ticks  
From: Will Miles 

> devn-speedo dated April 30 2004

I don't think the phy probing changes were
in that long ago.  If you try a newer driver
I suspect your occasional packet loss will
go away.

--
aboyd
RE: qnet: periodic_ticks  
From: Will Miles [mailto:community-noreply@qnx.com] 

> raising the periodic_ticks value to 50 

Oops.  It appears I misread your original message - I
thought you went from a 200ms tick (5 ticks/sec) to
a 50ms tick (20 ticks/sec) but upon re-reading, you
have actually gone to a 20ms tick (50 ticks/sec) so 
the re-scaling comment remains the same, except that
you should multiply all your other ticks by 10
NOT 4 to avoid any change in default behaviour.

Sorry about that!

--
aboyd