Davide Ancri
|
Re: RE: Possible causes for TCS_REM_DOWN in qnet link
|
Davide Ancri
12/11/2009 6:42 PM
post43737
|
Re: RE: Possible causes for TCS_REM_DOWN in qnet link
Thanks a lot Andrew!
please check my comments between your quoted text:
> When the remote node transmits a TCS_REM_DOWN at
> you, it's because it is explictly tearing down
> the node-to-node connection with you.
So it's normal that IP traffic continues as before, even while the qnet is down, isn't it?
> Take a look at the sloginfo output on the remote
> node as to why the connection was torn down.
Great, I'll take a look ASAP
> 1) with connection idle - ie no user data traffic - it
> didn't hear back after six qos heartbeats, which by
> default are 10 seconds apart
This should not be the case, since qnet traffic is ongoing.
> 2) during an attempted transfer of user data, the remote
> node repeatedly timed out, and never heard back from this
> node, and eventually tore the connection down
Maybe a burst of IP traffic is stealing the opportunity to trasmit (or handle in rx) qnet acks?
> 3) an application on the remote node simply did a
> "rmdir /net/thisnode" or the functional equivalent.
Should not be the case... but it can be an idea to investigate on ;)
> Also, you can crank up the diagnostic output with the
> "qos_verbose=X" option.
Do you mean on the qnet command line, when io-net is started?
Thanks again!
Davide
|
|
|
Davide Ancri
|
Re: RE: Possible causes for TCS_REM_DOWN in qnet link
|
Davide Ancri
12/14/2009 11:44 AM
post43809
|
Re: RE: Possible causes for TCS_REM_DOWN in qnet link
> Take a look at the sloginfo output on the remote
> node as to why the connection was torn down.
Here's a summary of the sloginfo output (consider that there's 5 nodes in total: Mx10A and ppu2, those are the 2 nodes
involved in the wireshark trace, and ppu1, ppu3, ppu4 also, giveing similar problems but left out the wireshark trace
for simplicity) :
Dec 07 09:39:07 7 15 0 npm-qnet(kif): client_pulse(): MsgReply(445) __KER_MSG_READV failed (Bad address)
Dec 07 09:39:07 7 15 0 npm-qnet(kif): client_pulse(): MsgReply(375) __KER_MSG_READV failed (Server fault on
msg pass)
Dec 07 09:39:07 7 15 0 npm-qnet(kif): client_pulse(): MsgReply(531) __KER_MSG_READV failed (Server fault on
msg pass)
Dec 07 09:39:07 7 15 0 npm-qnet(kif): client_pulse(): MsgReply(615) __KER_MSG_READV failed (Bad address)
Dec 07 09:39:07 7 15 0 npm-qnet(kif): client_pulse(): MsgReply(548) __KER_MSG_READV failed (Server fault on
msg pass)
Dec 07 09:39:07 7 15 0 npm-qnet(kif): client_pulse(): MsgReply(512) __KER_MSG_READV failed (Bad address)
Dec 07 09:39:07 7 15 0 npm-qnet(kif): client_pulse(): MsgReply(488) __KER_MSG_READV failed (Bad address)
Dec 07 09:39:07 7 15 0 npm-qnet(kif): client_pulse(): MsgReply(493) __KER_MSG_READV failed (Server fault on
msg pass)
Dec 07 09:39:07 7 15 0 npm-qnet(kif): client_pulse(): MsgReply(429) __KER_MSG_READV failed (Server fault on
msg pass)
Dec 07 09:39:07 7 15 0 npm-qnet(kif): client_pulse(): MsgReply(388) __KER_MSG_READV failed (Bad address)
Dec 07 09:39:07 7 15 0 npm-qnet(kif): client_pulse(): MsgReply(411) __KER_MSG_READV failed (Bad address)
Dec 07 09:39:07 7 15 0 npm-qnet(kif): client_pulse(): MsgReply(518) __KER_MSG_READV failed (Server fault on
msg pass)
Dec 07 09:39:07 7 15 0 npm-qnet(QOS): nd_change_notify(): Node Down: nd 8 ppu3.Mx10A
Dec 07 09:39:07 7 15 0 npm-qnet(QOS): qos_tx_done(): TX_ERR_NDDN for nd 8, failing TX w/EHOSTDOWN
Dec 07 09:39:07 7 15 0 npm-qnet(QOS): tx_complete(): callback for nd 8 to layer 0 with errno 264
Dec 07 09:39:07 7 15 0 npm-qnet(kif): disconnect(): MsgError(493) failed (No such process)
Dec 07 09:39:07 7 15 0 npm-qnet(kif): disconnect(): MsgError(531) failed (No such process)
Dec 07 09:39:07 7 15 0 npm-qnet(kif): disconnect(): MsgError(548) failed (No such process)
Dec 07 09:39:07 7 15 0 npm-qnet(L4): l4_tx_rx_ack_r_nack(): unkn tx: nd:8 dc:8 seq:109752
Dec 07 09:39:07 7 15 0 npm-qnet(QOS): nd_change_notify(): Node Down: nd 5 ppu1.Mx10A
Dec 07 09:39:07 7 15 0 npm-qnet(QOS): qos_tx_done(): TX_ERR_NDDN for nd 5, failing TX w/EHOSTDOWN
Dec 07 09:39:07 7 15 0 npm-qnet(QOS): tx_complete(): callback for nd 5 to layer 0 with errno 264
Dec 07 09:39:07 7 15 0 npm-qnet(kif): kif_client_outbound_failed(): MsgError(518) failed (No such process)
Dec 07 09:39:07 7 15 0 npm-qnet(QOS): qos_tx_done(): TX_ERR_NDDN for nd 5, failing TX w/EHOSTDOWN
Dec 07 09:39:07 7 15 0 npm-qnet(QOS): tx_complete(): callback for nd 5 to layer 0 with errno 264
Dec 07 09:39:07 7 15 0 npm-qnet(QOS): qos_tx_done(): TX_ERR_NDDN for nd 5, failing TX w/EHOSTDOWN
Dec 07 09:39:07 7 15 0 npm-qnet(QOS): tx_complete(): callback for nd 5 to layer 0 with errno 264
Dec 07 09:39:07 7 15 0 npm-qnet(kif): kif_client_outbound_failed(): MsgError(411) failed (No such process)
Dec 07 09:39:07 7 15 0 npm-qnet(QOS): qos_tx_done(): TX_ERR_NDDN for nd 5, failing TX w/EHOSTDOWN
Dec 07 09:39:07 7 15 0 npm-qnet(QOS): tx_complete(): callback for nd 5 to layer 0 with errno 264
Dec 07 09:39:07 7 15 0 npm-qnet(QOS): qos_tx_done(): TX_ERR_NDDN for nd 5, failing TX w/EHOSTDOWN
Dec 07 09:39:07 7 15 0 npm-qnet(QOS): tx_complete(): callback for nd 5 to layer 0 with errno 264
Dec 07 09:39:07 ...
View Full Message
|
|
|