Project Home
Project Home
Wiki
Wiki
Discussion Forums
Discussions
Project Information
Project Info
Forum Topic - TCP Response invalid???: (6 Items)
   
TCP Response invalid???  
...continued from "TCP stream socket send() thread safety" thread

Hello!

So, first the ending of the old story... I did not work with Windows for quite a long time already that's why assumption
 regarding necessity to read until WSAEWOULDBLOCK is returned was incorrect. For programmer's convenience Windows re-
enables socket select event signalling after any call to recv(). However, handle shuffling was still an issue.

Nevertheless, after corrections to client the problem is not gone.
I had to make further investigations. I ran Wireshark (formerly, Ethereal) on Windows machine to investigate the problem
.

Here is normal request-response for a heartbeat packet (172.25.31.241 is QNX host, 172.25.31.249 is Windows client):
======begin======
No.     Time        Source                Destination           Protocol Info
 115272 530.469917  172.25.31.249         172.25.31.241         TCP      2789 > 16015 [PSH, ACK] Seq=71129 Ack=4198257 
Win=64959 [TCP CHECKSUM INCORRECT] Len=16

Frame 115272 (70 bytes on wire, 70 bytes captured)
    Arrival Time: Dec 12, 2007 14:51:19.841042000
    [Time delta from previous captured frame: 0.130069000 seconds]
    [Time delta from previous displayed frame: 6.644444000 seconds]
    [Time since reference or first frame: 530.469917000 seconds]
    Frame Number: 115272
    Frame Length: 70 bytes
    Capture Length: 70 bytes
    [Frame is marked: False]
    [Protocols in frame: eth:ip:tcp:data]
    [Coloring Rule Name: Checksum Errors]
    [Coloring Rule String: cdp.checksum_bad==1 || edp.checksum_bad==1 || ip.checksum_bad==1 || tcp.checksum_bad==1 || 
udp.checksum_bad==1]
Ethernet II, Src: AsustekC_5c:57:23 (00:15:f2:5c:57:23), Dst: Intel_b9:5d:30 (00:0e:0c:b9:5d:30)
    Destination: Intel_b9:5d:30 (00:0e:0c:b9:5d:30)
        Address: Intel_b9:5d:30 (00:0e:0c:b9:5d:30)
        .... ...0 .... .... .... .... = IG bit: Individual address (unicast)
        .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
    Source: AsustekC_5c:57:23 (00:15:f2:5c:57:23)
        Address: AsustekC_5c:57:23 (00:15:f2:5c:57:23)
        .... ...0 .... .... .... .... = IG bit: Individual address (unicast)
        .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
    Type: IP (0x0800)
Internet Protocol, Src: 172.25.31.249 (172.25.31.249), Dst: 172.25.31.241 (172.25.31.241)
    Version: 4
    Header length: 20 bytes
    Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00)
        0000 00.. = Differentiated Services Codepoint: Default (0x00)
        .... ..0. = ECN-Capable Transport (ECT): 0
        .... ...0 = ECN-CE: 0
    Total Length: 56
    Identification: 0x139f (5023)
    Flags: 0x04 (Don't Fragment)
        0... = Reserved bit: Not set
        .1.. = Don't fragment: Set
        ..0. = More fragments: Not set
    Fragment offset: 0
    Time to live: 128
    Protocol: TCP (0x06)
    Header checksum: 0x4f04 [correct]
        [Good: True]
        [Bad : False]
    Source: 172.25.31.249 (172.25.31.249)
    Destination: 172.25.31.241 (172.25.31.241)
Transmission Control Protocol, Src Port: 2789 (2789), Dst Port: 16015 (16015), Seq: 71129, Ack: 4198257, Len: 16
    Source port: 2789 (2789)
    Destination port: 16015 (16015)
    Sequence number: 71129    (relative sequence number)
    [Next sequence number: 71145    (relative sequence number)]
    Acknowledgement number: 4198257    (relative ack number)
    Header length: 20 bytes
    Flags: 0x18 (PSH, ACK)
        0... .... = Congestion Window Reduced (CWR): Not set
        .0.. .... = ECN-Echo: Not set
        ..0. .... = Urgent: Not set
        ...1 .... = Acknowledgment: Set
        .... 1... = Push: Set
        .... .0.. = Reset: Not set
        .... ..0. = Syn: Not set
        .... ...0 = Fin: Not set
    Window size: 64959
    Checksum: 0x9847 [incorrect, should be 0xe476 (maybe caused by "TCP checksum offload"?)]
     ...
View Full Message
Re: TCP Response invalid???  
>  > 2789 [PSH, ACK] Seq=5136073 Ack=95061 Win=17520 Len=28[Malformed Packet]
> 
> Frame 117052 (82 bytes on wire, 82 bytes captured)

Can anybody confirm that the frame 117052 is malformed indeed? With knowledge of IP packet format it should not take 
much time. I just want to be sure I've located the source of the problem at last.
Re: TCP Response invalid???  
> >  > 2789 [PSH, ACK] Seq=5136073 Ack=95061 Win=17520 Len=28[Malformed Packet]
> > 
> > Frame 117052 (82 bytes on wire, 82 bytes captured)
> 
> Can anybody confirm that the frame 117052 is malformed indeed? With knowledge 
> of IP packet format it should not take much time. I just want to be sure I've 
> located the source of the problem at last.


Both IP/TCP header are seems fine. I think the "malformed packet" is a Ethereal thing. For some reason, it decided the 
packet contents should be SMPP and then can't decode it.

Both IP hdr cksum and TCP cksum are correct. And you confirmed the contents is correct. Are you sure this packet is not 
delievered to application? Unless the tcp seq number is wrong, otherwise I can't see why this packet is not delievered. 
Re: TCP Response invalid???  
Seq number change from previous packet does match. 
This "Malformed packet: SMPP" thing ooks like the problem of Wireshark indeed. 
Thank you for the analysis. I will be doing more investigation then.
Re: TCP Response invalid???  
After some debugging we have found that envelopes may still intermix in some cases and this is the reason of 
communication problems. Effective priorities of all sender threads are the same (I put assert in the code before send() 
invocation to validate it). So, it looks like Sean's fix does not always help. However I doubt it is worth spending time
 to correct it unless atomicity is fully implemented (atomicity for threads with different priorities). Anyway I'm going
 to have threads at varying priorities and implementation that guarrantees atomicity for matching priority threads only 
would be useless for me.
Re: TCP Response invalid???  
Well, I worked the problem around.
To minimize impact of serialized access to socket I've added output queue in my application. Now, if socket is busy, a 
thread instead of blocking on mutex just adds its envelope to the queue and returns to the pool. Only the thread with 
the highest priority among the others goes on waiting on the mutex to provide correct priority inheritance. When 
previous sender exits from send() call, that thread acquires the mutex and sends all the queued envelopes together.
Note, that was implemented without any additional synchronization objects, without additional calls to memory allocation
 functions and even without envelope data copying. ;)