Will Miles
07/25/2008 1:34 PM
post11093
|
Hello networking gurus,
What are the possible repercussions of increasing the periodic_ticks value for qnet-l4_lite?
Here's the situation: I've got two nodes connected using 100Mbit ethernet over a crossover cable. One node is
attempting to read six or seven different measurements from the other node over qnet every 0.1 seconds. Every once in
a while, though, the connection seems to stall for a long period of time (>0.5s), with many messages blocked on qnet,
until eventually it unsticks itself and carries on.
The sloginfo output of the second node shows many messages like:
Jul 24 20:09:00 7 15 0 npm-qnet(L4): l4_tx_timeout(): timeout: nd 1 sc 1 dc 1 ss 12737 tk 19062 ct 19064
Jul 24 20:09:00 7 15 0 npm-qnet(L4): l4_tx_max_pkt_set(): nd 1 slow mode: passed -1 pkts, window 1 pkts
Digging through the documentation and source suggested that the default timeout period is huge (200-400ms, much larger
than our sample interval and hundreds of times the network latency). Of course, the only way to decrease the timeout is
to increase the periodic_ticks, but the documentation notes this will affect the timing of all other qnet operations.
I tried raising the periodic_ticks value to 50 without adjusting any of the other options, and this seemed to solve our
problem - not only did we no longer see occasional huge latencies, the number of timeouts also seemed to go down.
Nevertheless, it's not clear to me what other impact this change could have. What possible side effects should I be on
the lookout for? I can be confident that we're not expecting to run qnet like this over any network with latencies
more than a couple ms.
Any wisdom would be greatly appreciated,
-Will
|
|
|