Project Home
Project Home
Wiki
Wiki
Discussion Forums
Discussions
Project Information
Project Info
Forum Topic - Qnet Node detection: (4 Items)
   
Qnet Node detection  
Hi aboyd,

In qos/qos_init.c   nd_change_notify() added a code(pulse) to get the notification whenever the remote node goes down or
 up.
problem is: i am getting the notification as soon as the node entered in to the Qnet cluster(case: EOK). but i am not 
getting the notification whenever the node goes down(case: EHOSTDOWN).

requirement is like as soon as the node goes down i need a notification(within 1 sec). mounted the lsm-qnet.so as 
mount -Tio-pkt -o auto_add=5 lsm-qnet.so

Thanks,
Raj.
RE: Qnet Node detection  
> qos/qos_init.c   nd_change_notify() added a code(pulse) 
> whenever the remote node goes down or up.

ok

> i am getting the notification as soon as the node 
> entered in to the Qnet cluster(case: EOK). but i am 
> not getting the notification whenever the node goes 
> down(case: EHOSTDOWN).

That's incredibly weird, because, as you can see
from the source, that's how qnet is notified when
a node does down (EHOSTDOWN).

Dumb question: when you

  # ls /net/fubar
  # rmdir /net/fubar

does qnet correctly see the node up/down in the sloginfo?

--
aboyd
Re: RE: Qnet Node detection  
Hi,

i am getting the notification after some time later like 1 or smthing.
if i want to get the notification within 1 sec what the option that i have to set.
Is this correct auto_add = 5.
In this mount -Tio-pkt -o auto_add=5 lsm-net.so 

Thanks,
Raj.




RE: RE: Qnet Node detection  
> i am getting the notification after some time 
> later like 1 or smthing.

But you ARE getting the EHOSTDOWN immediately 
after the rmdir, correct?

> i want to get the notification within 1 sec 
> what the option that i have to set.

One second is pretty tight - any transient
communication outage, network congestion or
busy high-priority threads may cause unnecessary
node downs with timeouts that tight!

Anyways, there are basically two sets of command
line options to control how fast qnet decides if
a node has disappeared.

One set of command line options has effect when
you are actually transmitting user data.  They
are tx_ticks=X and tx_retries=XD.

The second set of command line options has effect
when there is NO user data being transmitted.  They
are conn_up_idle=X and conn_up_retries.

Again, be sure to carefully re-test your ENTIRE
application after changing ANY qnet option, because
of unintended side effects.

--
aboyd