Project Home
Project Home
Wiki
Wiki
Discussion Forums
Discussions
Project Information
Project Info
Forum Topic - Possible causes for TCS_REM_DOWN in qnet link: (4 Items)
   
Possible causes for TCS_REM_DOWN in qnet link  
Hi all

I am investigating about a strange situation, where a QNX node suddenly disappear from the qnet, without any clear 
reason.

I got a wireshark trace of what's going on over the ethernet interface user by qnet on the disappearing node (qnet is 
directly over ethernet, while IP is also used for my custom applications on the same interface), and thanks to Yao Zhao 
qnet dissector I found some strange packets when the node exits from the scene.

The first one is not dissected properly, but the following TCS_REM_DOWN seems to be something like "I think the qnet 
link between me and the remote node should stop working".

Well, but... WHY ? ;)

I attach the packets explosed in text format, can anyone tell me which are the possible reasons?

I has become a blocker issue for us... thanks in advance!

Davide
Attachment: Text traces.txt 5.48 KB
RE: Possible causes for TCS_REM_DOWN in qnet link  
When the remote node transmits a TCS_REM_DOWN at
you, it's because it is explictly tearing down
the node-to-node connection with you.

Take a look at the sloginfo output on the remote
node as to why the connection was torn down.  

Reasons include:

1) with connection idle - ie no user data traffic - it
didn't hear back after six qos heartbeats, which by
default are 10 seconds apart

2) during an attempted transfer of user data, the remote
node repeatedly timed out, and never heard back from this
node, and eventually tore the connection down

3) an application on the remote node simply did a
"rmdir /net/thisnode" or the functional equivalent.

Also, you can crank up the diagnostic output with the
"qos_verbose=X" option.

--
aboyd

-----Original Message-----
From: Davide Ancri [mailto:community-noreply@qnx.com] 
Sent: Friday, December 11, 2009 12:13 PM
To: technology-networking
Subject: Possible causes for TCS_REM_DOWN in qnet link

Hi all

I am investigating about a strange situation, where a QNX node suddenly
disappear from the qnet, without any clear reason.

I got a wireshark trace of what's going on over the ethernet interface
user by qnet on the disappearing node (qnet is directly over ethernet,
while IP is also used for my custom applications on the same interface),
and thanks to Yao Zhao qnet dissector I found some strange packets when
the node exits from the scene.

The first one is not dissected properly, but the following TCS_REM_DOWN
seems to be something like "I think the qnet link between me and the
remote node should stop working".

Well, but... WHY ? ;)

I attach the packets explosed in text format, can anyone tell me which
are the possible reasons?

I has become a blocker issue for us... thanks in advance!

Davide




_______________________________________________

Technology
http://community.qnx.com/sf/go/post43673
Re: RE: Possible causes for TCS_REM_DOWN in qnet link  
Thanks a lot Andrew!

please check my comments between your quoted text:

> When the remote node transmits a TCS_REM_DOWN at
> you, it's because it is explictly tearing down
> the node-to-node connection with you.

So it's normal that IP traffic continues as before, even while the qnet is down, isn't it?


> Take a look at the sloginfo output on the remote
> node as to why the connection was torn down.  

Great, I'll take a look ASAP

 
> 1) with connection idle - ie no user data traffic - it
> didn't hear back after six qos heartbeats, which by
> default are 10 seconds apart

This should not be the case, since qnet traffic is ongoing.


> 2) during an attempted transfer of user data, the remote
> node repeatedly timed out, and never heard back from this
> node, and eventually tore the connection down

Maybe a burst of IP traffic is stealing the opportunity to trasmit (or handle in rx) qnet acks?


> 3) an application on the remote node simply did a
> "rmdir /net/thisnode" or the functional equivalent.

Should not be the case... but it can be an idea to investigate on ;)


> Also, you can crank up the diagnostic output with the
> "qos_verbose=X" option.

Do you mean on the qnet command line, when io-net is started?

Thanks again!
Davide
Re: RE: Possible causes for TCS_REM_DOWN in qnet link  
> Take a look at the sloginfo output on the remote
> node as to why the connection was torn down.  

Here's a summary of the sloginfo output (consider that there's 5 nodes in total: Mx10A and ppu2, those are the 2 nodes 
involved in the wireshark trace, and ppu1, ppu3, ppu4 also, giveing similar problems but left out the wireshark trace 
for simplicity) :

Dec 07 09:39:07    7    15     0 npm-qnet(kif): client_pulse(): MsgReply(445) __KER_MSG_READV failed (Bad address)
Dec 07 09:39:07    7    15     0 npm-qnet(kif): client_pulse(): MsgReply(375) __KER_MSG_READV failed (Server fault on 
msg pass)
Dec 07 09:39:07    7    15     0 npm-qnet(kif): client_pulse(): MsgReply(531) __KER_MSG_READV failed (Server fault on 
msg pass)
Dec 07 09:39:07    7    15     0 npm-qnet(kif): client_pulse(): MsgReply(615) __KER_MSG_READV failed (Bad address)
Dec 07 09:39:07    7    15     0 npm-qnet(kif): client_pulse(): MsgReply(548) __KER_MSG_READV failed (Server fault on 
msg pass)
Dec 07 09:39:07    7    15     0 npm-qnet(kif): client_pulse(): MsgReply(512) __KER_MSG_READV failed (Bad address)
Dec 07 09:39:07    7    15     0 npm-qnet(kif): client_pulse(): MsgReply(488) __KER_MSG_READV failed (Bad address)
Dec 07 09:39:07    7    15     0 npm-qnet(kif): client_pulse(): MsgReply(493) __KER_MSG_READV failed (Server fault on 
msg pass)
Dec 07 09:39:07    7    15     0 npm-qnet(kif): client_pulse(): MsgReply(429) __KER_MSG_READV failed (Server fault on 
msg pass)
Dec 07 09:39:07    7    15     0 npm-qnet(kif): client_pulse(): MsgReply(388) __KER_MSG_READV failed (Bad address)
Dec 07 09:39:07    7    15     0 npm-qnet(kif): client_pulse(): MsgReply(411) __KER_MSG_READV failed (Bad address)
Dec 07 09:39:07    7    15     0 npm-qnet(kif): client_pulse(): MsgReply(518) __KER_MSG_READV failed (Server fault on 
msg pass)
Dec 07 09:39:07    7    15     0 npm-qnet(QOS): nd_change_notify(): Node Down: nd 8 ppu3.Mx10A
Dec 07 09:39:07    7    15     0 npm-qnet(QOS): qos_tx_done(): TX_ERR_NDDN for nd 8, failing TX w/EHOSTDOWN
Dec 07 09:39:07    7    15     0 npm-qnet(QOS): tx_complete(): callback for nd 8 to layer 0 with errno 264 
Dec 07 09:39:07    7    15     0 npm-qnet(kif): disconnect(): MsgError(493) failed (No such process)
Dec 07 09:39:07    7    15     0 npm-qnet(kif): disconnect(): MsgError(531) failed (No such process)
Dec 07 09:39:07    7    15     0 npm-qnet(kif): disconnect(): MsgError(548) failed (No such process)
Dec 07 09:39:07    7    15     0 npm-qnet(L4): l4_tx_rx_ack_r_nack(): unkn tx: nd:8 dc:8 seq:109752
Dec 07 09:39:07    7    15     0 npm-qnet(QOS): nd_change_notify(): Node Down: nd 5 ppu1.Mx10A
Dec 07 09:39:07    7    15     0 npm-qnet(QOS): qos_tx_done(): TX_ERR_NDDN for nd 5, failing TX w/EHOSTDOWN
Dec 07 09:39:07    7    15     0 npm-qnet(QOS): tx_complete(): callback for nd 5 to layer 0 with errno 264 
Dec 07 09:39:07    7    15     0 npm-qnet(kif): kif_client_outbound_failed(): MsgError(518) failed (No such process)
Dec 07 09:39:07    7    15     0 npm-qnet(QOS): qos_tx_done(): TX_ERR_NDDN for nd 5, failing TX w/EHOSTDOWN
Dec 07 09:39:07    7    15     0 npm-qnet(QOS): tx_complete(): callback for nd 5 to layer 0 with errno 264 
Dec 07 09:39:07    7    15     0 npm-qnet(QOS): qos_tx_done(): TX_ERR_NDDN for nd 5, failing TX w/EHOSTDOWN
Dec 07 09:39:07    7    15     0 npm-qnet(QOS): tx_complete(): callback for nd 5 to layer 0 with errno 264 
Dec 07 09:39:07    7    15     0 npm-qnet(kif): kif_client_outbound_failed(): MsgError(411) failed (No such process)
Dec 07 09:39:07    7    15     0 npm-qnet(QOS): qos_tx_done(): TX_ERR_NDDN for nd 5, failing TX w/EHOSTDOWN
Dec 07 09:39:07    7    15     0 npm-qnet(QOS): tx_complete(): callback for nd 5 to layer 0 with errno 264 
Dec 07 09:39:07    7    15     0 npm-qnet(QOS): qos_tx_done(): TX_ERR_NDDN for nd 5, failing TX w/EHOSTDOWN
Dec 07 09:39:07    7    15     0 npm-qnet(QOS): tx_complete(): callback for nd 5 to layer 0 with errno 264 
Dec 07 09:39:07   ...
View Full Message
Attachment: Text traces.txt 66.14 KB