Andrew Boyd(deleted)
|
RE: ndp (remote node identification)
|
Andrew Boyd(deleted)
01/16/2009 11:36 AM
post20224
|
RE: ndp (remote node identification)
> how exactly qnet detects the remote node is there or not
Well, as simple way for you to check to see if qnet thinks
there is a remote node is to:
# ls /net
Every 30 seconds (this is configurable with cmd line options
to qnet) qnet will burp out an unsolicited broadcast packet,
announcing it's hostname and mac address. This has the effect
of populating the /net directory, which most people like. See
the cmd line option: auto_add
However, just because there is an entry in /net doesn't mean
the node is up. It could have powered up 5 minutes ago, and
then powered after running for 1 minute. Or, it could have
changed it's hostname. So really, /net is just a hint at what
the network state is.
If there is no entry in /net, qnet will actively attempt to
resolve it, if an application wants it. For example:
# ls /net/fubar
will cause the qnet resolver to attempt to figure out the
mac address of the host fubar. It does this by transmitting
a broadcast packet. Normally the response is immediate, but
if it is not, the resolver will timeout and retry. These
values are controlled by the qnet cmd line options: res_retries
and res_ticks
The next step up the ladder is for qnet to establish a "session"
to a remote node. This must occur before any application data
transfer occurs. You can tell what nodes qnet has session
connections to, by running
# cat /proc/qnetstats | less
and looking for the work "Connection". Qnet actually has both
tx and rx connections - they are unidirectional. There are
command line options to control tx connection establishment
timeouts and retries: conn_est_timeout and conn_est_retries
When qnet has a tx session aka connection established to a remote
node, it can either be idle, or actively attempting to transfer
user data. If it's idle, after 10 seconds, qnet will probe the
remote node to see if it's still alive. If after 6 of these probes,
the remote node hasn't responded, the tx connection is torn down.
These 10/6 parameters are also controlled by command line options
to qnet: conn_up_idle and conn_up_retries.
Now, if it's actively transferring data, there are another set
of cmd line options which control the number of retries, and
the timeout for the L4: tx_ticks and tx_retries
So, you can configure qnet any way you want it to be, with
respect to timeouts and retries. Do the following:
# use /lib/dll/npm-qnet.so
to learn more about qnet command line options!
--
aboyd
|
|
|