Oleh Derevenko(deleted)
|
TCP stream socket send() thread safety
|
Oleh Derevenko(deleted)
11/29/2007 5:58 AM
post3079
|
TCP stream socket send() thread safety
Hello
Sorry if I've chosen a wrong forum and/or if my story is not a priority at this time but I would like to attract some
more attention to the problem.
So, few years ago I had a problem PR24873 reported regarding socket send() function usage. If there were several
provider threads sending data blocks over a single socket, those blocks might arrive to receiver with data intermixed.
That is, the data transfer was not atomic. I worked the problem around by locking a mutex around send() invocation.
After a year and something I was happy to find my PR in resolved list of one of cumulative pre-SP3 patches (patch 234).
Naturally, my support plan was expired till that time already.
Well, I removed the mutex and the data did not intermix any more. However a new problem appeared. When program was
launched and there were numerous data transfers from multiple threads whole networking subsystem stalled. Only few data
blocks could go through and the receiver detected a timeout of 10 seconds! and disconnected. Then after some delay it
tried to reconnect, initiated startup once again and the same story was repeated over and over. Only after 10-15 minutes
system could stabilize and transit to normal functioning. However if there were load peaks later at runtime the
scenario with networking denial of service and disconnects-reconnects could repeat.
Obviously, the patch was not that good. Perhaps somebody blocked too much code with a mutex and introduced a bottleneck.
But since I did not have a support plan any more, my further letters were happily ignored. So I had to sigh and
uncomment my mutex around send() again. This is the way I'm running it till this time in "fast, robust and reliable
operating system".
So, if you are now doing active development in networking area, maybe somebody would take a look at this issue (if the
code has not been completely rewritten yet, of course)?
|
|
|
Sean Boudreau(deleted)
|
Re: TCP stream socket send() thread safety
|
Sean Boudreau(deleted)
11/29/2007 9:46 AM
post3086
|
Re: TCP stream socket send() thread safety
On Thu, Nov 29, 2007 at 05:58:51AM -0500, Oleh Derevenko wrote:
> Hello
>
> Sorry if I've chosen a wrong forum and/or if my story is not a priority
> at this time but I would like to attract some more attention to the
> problem.
>
> So, few years ago I had a problem PR24873 reported regarding socket
> send() function usage. If there were several provider threads sending
> data blocks over a single socket, those blocks might arrive to receiver
> with data intermixed. That is, the data transfer was not atomic. I
> worked the problem around by locking a mutex around send() invocation.
> After a year and something I was happy to find my PR in resolved list of
> one of cumulative pre-SP3 patches (patch 234). Naturally, my support
> plan was expired till that time already.
> Well, I removed the mutex and the data did not intermix any more.
> However a new problem appeared. When program was launched and there were
> numerous data transfers from multiple threads whole networking subsystem
> stalled. Only few data blocks could go through and the receiver detected
> a timeout of 10 seconds! and disconnected. Then after some delay it
> tried to reconnect, initiated startup once again and the same story was
> repeated over and over. Only after 10-15 minutes system could stabilize
> and transit to normal functioning. However if there were load peaks
> later at runtime the scenario with networking denial of service and
> disconnects-reconnects could repeat.
>
> Obviously, the patch was not that good. Perhaps somebody blocked too
> much code with a mutex and introduced a bottleneck. But since I did not
> have a support plan any more, my further letters were happily ignored.
> So I had to sigh and uncomment my mutex around send() again. This is the
> way I'm running it till this time in "fast, robust and reliable
> operating system".
>
> So, if you are now doing active development in networking area, maybe
> somebody would take a look at this issue (if the code has not been
I remember this issue. tcp is a stream protocol and
therefore is not atomic. I couldn't find a lot of direction
in any spec in this area at the time so I made it more
'intuitive'. Two scheduling issues come into play when a
send() has to block. First the threads block on a
particular socket and secondly, when they wake up they are
rescheduled according to the client's priority. The issue
you were seeing was a fifo vs lifo issue when waking up
threads blocked on a particular socket. The initial unblock
at the socket level now comes out in the order you expect
but if the requesting threads are of different priority the
higher one can still preempt the lower. That is, it works
as you expect because all your threads are at the same
priority. Note this was fixed without any extra mutexes.
Your new issue sounds like you may be exhausting the number
of 'threads' in the stack. Are thrads reply blocked on
io-net in this situation? Is there a sloginfo entry to this
effect (must be running the latest patch)? You can increase
the number of stack 'threads' as follows:
# io-net -ptcpip threads_max=400
-seanb
|
|
|
Oleh Derevenko(deleted)
|
Re: TCP stream socket send() thread safety
|
Oleh Derevenko(deleted)
11/29/2007 10:29 AM
post3092
|
Re: TCP stream socket send() thread safety
> I remember this issue. tcp is a stream protocol and
> therefore is not atomic. I couldn't find a lot of direction
> in any spec in this area at the time so I made it more
> 'intuitive'.
Well, but you should agree that if I send data envelopes from multiple threads it is natural to expect they arrive to
client intact (even though their order could be unpredictable). At least socket implementation in Windows acts like this
.
And also, you should agree that serializing access to socket in client application is quite inefficient. send() is
simply retranslated in MsgSend() and I do not see any reasons why several MsgSend's could not be invoked in parallel and
serialized in server process if necessary.
> higher one can still preempt the lower. That is, it works
> as you expect because all your threads are at the same
> priority.
That's pretty sad to find out. :( Why can't you lock a mutex while data is being put in output buffer (I do not know how
it is implemented, of course). If higher priority thread becomes ready while lower priority thread holds a mutex it
will block on the mutex and temporarily raise priority of first thread. There would be a small delay for high priority
thread, however send() would act atomically.
> Your new issue sounds like you may be exhausting the number
> of 'threads' in the stack. Are thrads send blocked on
> io-net in this situation? Is there a sloginfo entry to this
> effect (must be running the latest patch)? You can increase
> the number of stack 'threads' as follows:
>
> # io-net -ptcpip threads_max=400
It was about 18 months ago and I did not check the state of threads. Documentation says there is 200 threads limit by
default. My inspections of sender process showed 120-130 threads running. And also the last but not least, we had never
seen that "denial of service" problem before patch even though we were running without send() serialized for quite a
long time before we discovered it may intermix the data.
I can remove mutex and make some experiments in next few days to see what is the state of threads if you would like.
|
|
|
Sean Boudreau(deleted)
|
Re: TCP stream socket send() thread safety
|
Sean Boudreau(deleted)
11/29/2007 11:04 AM
post3101
|
Re: TCP stream socket send() thread safety
On Thu, Nov 29, 2007 at 10:29:30AM -0500, Oleh Derevenko wrote:
> > I remember this issue. tcp is a stream protocol and
> > therefore is not atomic. I couldn't find a lot of direction
> > in any spec in this area at the time so I made it more
> > 'intuitive'.
>
> Well, but you should agree that if I send data envelopes from multiple
> threads it is natural to expect they arrive to client intact (even
> though their order could be unpredictable). At least socket
> implementation in Windows acts like this.
> And also, you should agree that serializing access to socket in client
> application is quite inefficient. send() is simply retranslated in
> MsgSend() and I do not see any reasons why several MsgSend's could not
> be invoked in parallel and serialized in server process if necessary.
Since tcp is a stream, the protocol has no concept of
envelopes or boundaries. Any such concept is built on
top of it at the application level.
>
> > higher one can still preempt the lower. That is, it works
> > as you expect because all your threads are at the same
> > priority.
>
> That's pretty sad to find out. :( Why can't you lock a mutex while data
> is being put in output buffer (I do not know how it is implemented, of
> course). If higher priority thread becomes ready while lower priority
> thread holds a mutex it will block on the mutex and temporarily raise
> priority of first thread. There would be a small delay for high priority
> thread, however send() would act atomically.
The handling of the MsgSends are serialized in the stack.
This situatation arises when the send buffer fills up and
the send(), write(), sendto(), sendmsg() in the client has
to block. When the send buffer drains we can continue
processing requests on this particular socket. Since tcp
is a stream which request should be processed first? Since
sockets have no concept like PIPE_BUF (that I could find)
it seems logical that the highest priority request should
win.
>
> > Your new issue sounds like you may be exhausting the number
> > of 'threads' in the stack. Are thrads send blocked on
> > io-net in this situation? Is there a sloginfo entry to this
> > effect (must be running the latest patch)? You can increase
> > the number of stack 'threads' as follows:
> >
> > # io-net -ptcpip threads_max=400
>
> It was about 18 months ago and I did not check the state of threads.
> Documentation says there is 200 threads limit by default. My inspections
> of sender process showed 120-130 threads running. And also the last but
> not least, we had never seen that "denial of service" problem before
> patch even though we were running without send() serialized for quite a
> long time before we discovered it may intermix the data.
>
> I can remove mutex and make some experiments in next few days to see
> what is the state of threads if you would like.
I'm pretty confident that the changes for this issue wouldn't
in themselves introduce an issue like this.
-seanb
|
|
|
Oleh Derevenko(deleted)
|
Re: TCP stream socket send() thread safety
|
Oleh Derevenko(deleted)
11/29/2007 11:33 AM
post3105
|
Re: TCP stream socket send() thread safety
> The handling of the MsgSends are serialized in the stack.
> This situatation arises when the send buffer fills up and
> the send(), write(), sendto(), sendmsg() in the client has
> to block. When the send buffer drains we can continue
> processing requests on this particular socket. Since tcp
> is a stream which request should be processed first? Since
> sockets have no concept like PIPE_BUF (that I could find)
> it seems logical that the highest priority request should
> win.
It may be logical from scheduler's point of view but it is completely senseless from point of view of socket's
functionality. Client does not have any possibility of parsing data successfully after preemption like that and this
makes socket inapplicable for multithreaded use. Even though there may not be a concept like PIPE_BUF for the socket its
functionality should be "user friendly". Who wins from strict adherence to priority rules if the data is corrupted as a
result? Can you show me at least one use case when there would be a benefit for the client from inserting unrelated
data inside its data block?
> I'm pretty confident that the changes for this issue wouldn't
> in themselves introduce an issue like this.
Well, I'll try to find out what is the state of threads when the connection is going down. However I'm running 6.3.0 SP3
. I can't ruin my development/testing environment by upgrading it to 6.3.2. I can try September 6.3.2 release at one
node at most. And I can't use the latest M2 build at all because kernel crashes after my application is started even
though it is a pure user-mode application without any privileged access to ports or hardware.
|
|
|
Sean Boudreau(deleted)
|
Re: TCP stream socket send() thread safety
|
Sean Boudreau(deleted)
11/29/2007 12:49 PM
post3116
|
Re: TCP stream socket send() thread safety
On Thu, Nov 29, 2007 at 11:33:51AM -0500, Oleh Derevenko wrote:
> > The handling of the MsgSends are serialized in the stack.
> > This situatation arises when the send buffer fills up and
> > the send(), write(), sendto(), sendmsg() in the client has
> > to block. When the send buffer drains we can continue
> > processing requests on this particular socket. Since tcp
> > is a stream which request should be processed first? Since
> > sockets have no concept like PIPE_BUF (that I could find)
> > it seems logical that the highest priority request should
> > win.
>
> It may be logical from scheduler's point of view but it is completely
> senseless from point of view of socket's functionality. Client does not
> have any possibility of parsing data successfully after preemption like
> that and this makes socket inapplicable for multithreaded use. Even
> though there may not be a concept like PIPE_BUF for the socket its
> functionality should be "user friendly". Who wins from strict adherence
> to priority rules if the data is corrupted as a result? Can you show me
> at least one use case when there would be a benefit for the client from
> inserting unrelated data inside its data block?
>
> > I'm pretty confident that the changes for this issue wouldn't
> > in themselves introduce an issue like this.
>
> Well, I'll try to find out what is the state of threads when the
> connection is going down. However I'm running 6.3.0 SP3. I can't ruin my
> development/testing environment by upgrading it to 6.3.2. I can try
> September 6.3.2 release at one node at most. And I can't use the latest
> M2 build at all because kernel crashes after my application is started
> even though it is a pure user-mode application without any privileged
> access to ports or hardware.
>
The fixes for this were local to npm-tcpip.so so you
shouldn't need to do anything except run the latest stack
and get rid of your mutex.
-seanb
|
|
|
Oleh Derevenko(deleted)
|
Re: TCP stream socket send() thread safety
|
Oleh Derevenko(deleted)
11/30/2007 6:00 AM
post3147
|
Re: TCP stream socket send() thread safety
Hi, Sean
>
> The fixes for this were local to npm-tcpip.so so you
> shouldn't need to do anything except run the latest stack
> and get rid of your mutex.
I have downloaded corenet-6.4.0-M0.tar.gz but there is no npm-tcpip.so in it. Should I use .so library from September
2007 build of 6.3.2?
|
|
|
Sean Boudreau(deleted)
|
Re: TCP stream socket send() thread safety
|
Sean Boudreau(deleted)
11/30/2007 11:37 AM
post3167
|
Re: TCP stream socket send() thread safety
On Fri, Nov 30, 2007 at 06:00:57AM -0500, Oleh Derevenko wrote:
> Hi, Sean
> >
> > The fixes for this were local to npm-tcpip.so so you
> > shouldn't need to do anything except run the latest stack
> > and get rid of your mutex.
>
> I have downloaded corenet-6.4.0-M0.tar.gz but there is no npm-tcpip.so
> in it. Should I use .so library from September 2007 build of 6.3.2?
The networking project on foundry 27 doesn't contain io-net
code. Apparently 6.3.2 has the latest npm-tcpip.so that
contains this fix but you can also get it here:
http://www.qnx.com/download/feature.html?programid=13008
-seanb
|
|
|
Oleh Derevenko(deleted)
|
Re: TCP stream socket send() thread safety
|
Oleh Derevenko(deleted)
11/30/2007 1:46 PM
post3190
|
Re: TCP stream socket send() thread safety
> The networking project on foundry 27 doesn't contain io-net
> code. Apparently 6.3.2 has the latest npm-tcpip.so that
> contains this fix but you can also get it here:
>
> http://www.qnx.com/download/feature.html?programid=13008
I'm running 6.3.0SP3 and this is pre-SP3 patch which is already included in SP3.
|
|
|
Oleh Derevenko(deleted)
|
Re: TCP stream socket send() thread safety
|
Oleh Derevenko(deleted)
11/29/2007 12:03 PM
post3107
|
Re: TCP stream socket send() thread safety
> The handling of the MsgSends are serialized in the stack.
> This situatation arises when the send buffer fills up and
> the send(), write(), sendto(), sendmsg() in the client has
> to block. When the send buffer drains we can continue
> processing requests on this particular socket. Since tcp
> is a stream which request should be processed first? Since
> sockets have no concept like PIPE_BUF (that I could find)
> it seems logical that the highest priority request should
> win.
And also, please consider that the situation with send buffer overflow is just an implementation limitation. In theory,
send buffer should be considered infinite and every new thread should just queue its data at the end. So, if the
operation that should normally be atomic, is suspended because of physical limitations it should have priority for
resume. All the rest threads which have not started their operations yet can be judged by their priority (you need not
preserve thread acceptance order for data) but you must not discriminate a thread depending on its luck to arrive to
empty buffer or to the buffer which is nearly full.
|
|
|
Sean Boudreau(deleted)
|
Re: TCP stream socket send() thread safety
|
Sean Boudreau(deleted)
11/29/2007 12:46 PM
post3114
|
Re: TCP stream socket send() thread safety
On Thu, Nov 29, 2007 at 12:03:18PM -0500, Oleh Derevenko wrote:
> > The handling of the MsgSends are serialized in the stack.
> > This situatation arises when the send buffer fills up and
> > the send(), write(), sendto(), sendmsg() in the client has
> > to block. When the send buffer drains we can continue
> > processing requests on this particular socket. Since tcp
> > is a stream which request should be processed first? Since
> > sockets have no concept like PIPE_BUF (that I could find)
> > it seems logical that the highest priority request should
> > win.
>
> And also, please consider that the situation with send buffer overflow
> is just an implementation limitation. In theory, send buffer should be
> considered infinite and every new thread should just queue its data at
> the end. So, if the operation that should normally be atomic, is
> suspended because of physical limitations it should have priority for
> resume. All the rest threads which have not started their operations yet
> can be judged by their priority (you need not preserve thread acceptance
> order for data) but you must not discriminate a thread depending on its
> luck to arrive to empty buffer or to the buffer which is nearly full.
SO_SNDBUF is a documented, real limitation. Currently the
requirements for not having data interleaved when multiple
threads write simultaneously on the same stream socket is to
run them at the same priority. Note even this might not be
portable as it is ascribing undocumented characteristics to
stream sockets. I don't think this is a bug. If you can
point out some spec, prior art or example (that doesn't work
by accident) that I may have missed I'll be happy to
reevaluate.
Regards,
-seanb
|
|
|
Oleh Derevenko(deleted)
|
Re: TCP stream socket send() thread safety
|
Oleh Derevenko(deleted)
11/30/2007 8:00 AM
post3149
|
Re: TCP stream socket send() thread safety
> SO_SNDBUF is a documented, real limitation. Currently the
> requirements for not having data interleaved when multiple
> threads write simultaneously on the same stream socket is to
> run them at the same priority. Note even this might not be
> portable as it is ascribing undocumented characteristics to
> stream sockets. I don't think this is a bug. If you can
> point out some spec, prior art or example (that doesn't work
> by accident) that I may have missed I'll be happy to
> reevaluate.
OK. I agree, I was wrong. Not because scheduling highest-priority thread is more important than preserving send block
atomicity and not because I found any specification on this topic. Rather, I realized that it would be questionable to
provide such atomicity with regard to absence of any limits for send() buffer maximum length. If a thread can push many
megabytes of data in a single call it should not block all the other threads waiting until all that data is transmitted.
Perhaps, what I need is SOCK_SEQPACKET mode. But it is not implemented. :(
|
|
|
Oleh Derevenko(deleted)
|
Re: TCP stream socket send() thread safety
|
Oleh Derevenko(deleted)
11/30/2007 8:54 AM
post3150
|
Re: TCP stream socket send() thread safety
By the way, here is an interesting read on topic provided to me in one of newsgroups
http://www.almaden.ibm.com/cs/people/marksmith/sendmsg.html
|
|
|
Oleh Derevenko(deleted)
|
Re: TCP stream socket send() thread safety
|
Oleh Derevenko(deleted)
11/29/2007 12:12 PM
post3110
|
Re: TCP stream socket send() thread safety
> > > Your new issue sounds like you may be exhausting the number
> > > of 'threads' in the stack. Are thrads send blocked on
> > > io-net in this situation? Is there a sloginfo entry to this
> > > effect (must be running the latest patch)? You can increase
> > > the number of stack 'threads' as follows:
> > >
> > > # io-net -ptcpip threads_max=400
> >
> > It was about 18 months ago and I did not check the state of threads.
> > Documentation says there is 200 threads limit by default. My inspections
> > of sender process showed 120-130 threads running. And also the last but
> > not least, we had never seen that "denial of service" problem before
> > patch even though we were running without send() serialized for quite a
> > long time before we discovered it may intermix the data.
> >
> > I can remove mutex and make some experiments in next few days to see
> > what is the state of threads if you would like.
>
> I'm pretty confident that the changes for this issue wouldn't
> in themselves introduce an issue like this.
Well, if thread poll maximum is for whole system (and it is, since there is only one io-net process and I would not
believe there could be a personal thread pool for every socket :) ), I can assume possibility to have more than 200
threads in several processes together. But 10 seconds is nearly the infinity! Several threads over 200 can't create a
delay like that considering the fact that all of them send just one data envelope and block afterwards.
|
|
|
Sean Boudreau(deleted)
|
Re: TCP stream socket send() thread safety
|
Sean Boudreau(deleted)
11/29/2007 12:52 PM
post3117
|
Re: TCP stream socket send() thread safety
On Thu, Nov 29, 2007 at 12:12:14PM -0500, Oleh Derevenko wrote:
> > > > Your new issue sounds like you may be exhausting the number
> > > > of 'threads' in the stack. Are thrads send blocked on
> > > > io-net in this situation? Is there a sloginfo entry to this
> > > > effect (must be running the latest patch)? You can increase
> > > > the number of stack 'threads' as follows:
> > > >
> > > > # io-net -ptcpip threads_max=400
> > >
> > > It was about 18 months ago and I did not check the state of threads.
> > > Documentation says there is 200 threads limit by default. My
> inspections
> > > of sender process showed 120-130 threads running. And also the last
> but
> > > not least, we had never seen that "denial of service" problem before
> > > patch even though we were running without send() serialized for
> quite a
> > > long time before we discovered it may intermix the data.
> > >
> > > I can remove mutex and make some experiments in next few days to see
> > > what is the state of threads if you would like.
> >
> > I'm pretty confident that the changes for this issue wouldn't
> > in themselves introduce an issue like this.
>
> Well, if thread poll maximum is for whole system (and it is, since there
> is only one io-net process and I would not believe there could be a
> personal thread pool for every socket :) ), I can assume possibility to
> have more than 200 threads in several processes together. But 10 seconds
> is nearly the infinity! Several threads over 200 can't create a delay
> like that considering the fact that all of them send just one data
> envelope and block afterwards.
>
The 'threads_max' argument to the stack controls how many
blocking operations the stack can service simultaneously.
They may be read(), write(), accept() ...
It's just an educated guess at this point. Look for clients
SEND blocked on io-net and the aforementioned sloginfo
entry.
Thanks,
-seanb
|
|
|
Oleh Derevenko(deleted)
|
Re: TCP stream socket send() thread safety
|
Oleh Derevenko(deleted)
11/30/2007 11:01 AM
post3160
|
Re: TCP stream socket send() thread safety
> The 'threads_max' argument to the stack controls how many
> blocking operations the stack can service simultaneously.
> They may be read(), write(), accept() ...
>
> It's just an educated guess at this point. Look for clients
> SEND blocked on io-net and the aforementioned sloginfo
> entry.
So far, I tried it with my current system (that is, 6.3.0 SP3).
To be able to reproduce it I had to plug ethernet back into 100Mbit hub just as we were running last year because with
1GB switch it did not reproduce at once.
When connections start flashing offline/online there are no threads in SEND-blocked state at all.
-----------------------------ONE NODE-----------------------------
# ps -A | grep io-net
77841 ? 00:00:30 io-net
# pidin -p 376859
pid tid name prio STATE Blocked
376859 1 ../bin/qnxip.bin 10o MUTEX 376859-06 #1
376859 2 ../bin/qnxip.bin 10o CONDVAR 8126bb4
376859 3 ../bin/qnxip.bin 10o CONDVAR 812666c
376859 4 ../bin/qnxip.bin 10o RECEIVE 2
376859 5 ../bin/qnxip.bin 10o RECEIVE 5
376859 6 ../bin/qnxip.bin 10o REPLY 77841
376859 7 ../bin/qnxip.bin 10o CONDVAR 824f874
376859 8 ../bin/qnxip.bin 10o REPLY 77841
376859 9 ../bin/qnxip.bin 10o CONDVAR 81eca14
376859 10 ../bin/qnxip.bin 10o CONDVAR 81fe80c
376859 11 ../bin/qnxip.bin 10o CONDVAR 82abef4
376859 12 ../bin/qnxip.bin 10o CONDVAR 81fe73c
376859 13 ../bin/qnxip.bin 10o CONDVAR 81ec604
376859 14 ../bin/qnxip.bin 10o CONDVAR 81ec32c
376859 15 ../bin/qnxip.bin 10o CONDVAR 81391f4
376859 16 ../bin/qnxip.bin 10o CONDVAR 81f1944
376859 17 ../bin/qnxip.bin 10o CONDVAR 81f159c
376859 18 ../bin/qnxip.bin 10o CONDVAR 81f10bc
376859 19 ../bin/qnxip.bin 10o CONDVAR 81f1394
376859 20 ../bin/qnxip.bin 10o CONDVAR 81f1b4c
376859 21 ../bin/qnxip.bin 10o CONDVAR 8201944
376859 22 ../bin/qnxip.bin 10o CONDVAR 81febb4
376859 23 ../bin/qnxip.bin 10o CONDVAR 8201f5c
376859 24 ../bin/qnxip.bin 10o CONDVAR 8201ae4
376859 25 ../bin/qnxip.bin 10o CONDVAR 8201c1c
376859 26 ../bin/qnxip.bin 10o CONDVAR 82161f4
376859 27 ../bin/qnxip.bin 10o CONDVAR 82234cc
376859 28 ../bin/qnxip.bin 10o CONDVAR 8223e8c
376859 29 ../bin/qnxip.bin 10o CONDVAR 8223944
376859 30 ../bin/qnxip.bin 10o CONDVAR 8216ae4
376859 31 ../bin/qnxip.bin 10o CONDVAR 8216ef4
376859 32 ../bin/qnxip.bin 10o CONDVAR 8228124
376859 33 ../bin/qnxip.bin 10o CONDVAR 8228464
376859 34 ../bin/qnxip.bin 10o CONDVAR 8235604
376859 36 ../bin/qnxip.bin 30o CONDVAR 81ec8dc
376859 37 ../bin/qnxip.bin 10o CONDVAR 8228bb4
376859 40 ../bin/qnxip.bin 10o CONDVAR 8239bb4
376859 41 ../bin/qnxip.bin 30o CONDVAR 8139dbc
376859 44 ../bin/qnxip.bin 10o REPLY 376881
376859 45 ../bin/qnxip.bin 30o CONDVAR 81bab4c
376859 46 ../bin/qnxip.bin 10o REPLY 376852
376859 47 ../bin/qnxip.bin 10o REPLY 376887
376859 48 ../bin/qnxip.bin 10o REPLY 376884
376859 50 ../bin/qnxip.bin 10o REPLY 376890
376859 53 ../bin/qnxip.bin 10o CONDVAR 82639ac
376859 54 ../bin/qnxip.bin 10o CONDVAR 82666d4
376859 55 ../bin/qnxip.bin 30o CONDVAR 81ade8c
376859 56 ../bin/qnxip.bin 10o CONDVAR 82661f4
376859 63 ../bin/qnxip.bin 10o CONDVAR 827be24
376859 65 ../bin/qnxip.bin 30o CONDVAR 82b9604
376859 67 ../bin/qnxip.bin 10o CONDVAR 82ab0bc
376859 68 ../bin/qnxip.bin 10o CONDVAR 827fd54
376859 69 ../bin/qnxip.bin 10o CONDVAR 82abb4c
376859 70 ../bin/qnxip.bin 10o CONDVAR 82af18c
376859 71...
View Full Message
|
|
|
Oleh Derevenko(deleted)
|
Re: TCP stream socket send() thread safety
|
Oleh Derevenko(deleted)
12/02/2007 8:30 AM
post3207
|
Re: TCP stream socket send() thread safety
What is strange about all this, that my sshd connection was quite fine and I did not see any delays. This is an argument
against the assumption that communication problems could be caused by excessive collisions in hub.
Another thing is that io-net still contains only 10 threads in it. As far as I understand it should have allocated more
threads in case of throughput problems.
|
|
|
Sean Boudreau(deleted)
|
Re: TCP stream socket send() thread safety
|
Sean Boudreau(deleted)
12/04/2007 9:29 AM
post3290
|
Re: TCP stream socket send() thread safety
On Sun, Dec 02, 2007 at 08:30:59AM -0500, Oleh Derevenko wrote:
> What is strange about all this, that my sshd connection was quite fine
> and I did not see any delays. This is an argument against the assumption
> that communication problems could be caused by excessive collisions in
> hub.
If tcp connections seem fine I'd check your qnet access as
that appeared to be in use in the previous pidin trace.
> Another thing is that io-net still contains only 10 threads in it. As
> far as I understand it should have allocated more threads in case of
> throughput problems.
'threads_max' in the stack context is really a misnomer.
What's actually increased is the number of co-routines the
stack allocates to handle message requests. The use of
the term 'threads' here is really a holdover from QNX4.
-seanb
|
|
|
Sean Boudreau(deleted)
|
Re: TCP stream socket send() thread safety
|
Sean Boudreau(deleted)
12/04/2007 9:25 AM
post3288
|
Re: TCP stream socket send() thread safety
On Fri, Nov 30, 2007 at 11:01:02AM -0500, Oleh Derevenko wrote:
> > The 'threads_max' argument to the stack controls how many
> > blocking operations the stack can service simultaneously.
> > They may be read(), write(), accept() ...
> >
> > It's just an educated guess at this point. Look for clients
> > SEND blocked on io-net and the aforementioned sloginfo
> > entry.
>
> So far, I tried it with my current system (that is, 6.3.0 SP3).
> To be able to reproduce it I had to plug ethernet back into 100Mbit hub
> just as we were running last year because with 1GB switch it did not
> reproduce at once.
>
> When connections start flashing offline/online there are no threads in
> SEND-blocked state at all.
>
> -----------------------------ONE NODE-----------------------------
> # ps -A | grep io-net
> 77841 ? 00:00:30 io-net
> # pidin -p 376859
> pid tid name prio STATE Blocked
> 376859 1 ../bin/qnxip.bin 10o MUTEX 376859-06 #1
> 376859 2 ../bin/qnxip.bin 10o CONDVAR 8126bb4
> 376859 3 ../bin/qnxip.bin 10o CONDVAR 812666c
> 376859 4 ../bin/qnxip.bin 10o RECEIVE 2
> 376859 5 ../bin/qnxip.bin 10o RECEIVE 5
> 376859 6 ../bin/qnxip.bin 10o REPLY 77841
> 376859 7 ../bin/qnxip.bin 10o CONDVAR 824f874
> 376859 8 ../bin/qnxip.bin 10o REPLY 77841
> 376859 9 ../bin/qnxip.bin 10o CONDVAR 81eca14
> 376859 10 ../bin/qnxip.bin 10o CONDVAR 81fe80c
> 376859 11 ../bin/qnxip.bin 10o CONDVAR 82abef4
> 376859 12 ../bin/qnxip.bin 10o CONDVAR 81fe73c
> 376859 13 ../bin/qnxip.bin 10o CONDVAR 81ec604
> 376859 14 ../bin/qnxip.bin 10o CONDVAR 81ec32c
> 376859 15 ../bin/qnxip.bin 10o CONDVAR 81391f4
> 376859 16 ../bin/qnxip.bin 10o CONDVAR 81f1944
> 376859 17 ../bin/qnxip.bin 10o CONDVAR 81f159c
> 376859 18 ../bin/qnxip.bin 10o CONDVAR 81f10bc
> 376859 19 ../bin/qnxip.bin 10o CONDVAR 81f1394
> 376859 20 ../bin/qnxip.bin 10o CONDVAR 81f1b4c
> 376859 21 ../bin/qnxip.bin 10o CONDVAR 8201944
> 376859 22 ../bin/qnxip.bin 10o CONDVAR 81febb4
> 376859 23 ../bin/qnxip.bin 10o CONDVAR 8201f5c
> 376859 24 ../bin/qnxip.bin 10o CONDVAR 8201ae4
> 376859 25 ../bin/qnxip.bin 10o CONDVAR 8201c1c
> 376859 26 ../bin/qnxip.bin 10o CONDVAR 82161f4
> 376859 27 ../bin/qnxip.bin 10o CONDVAR 82234cc
> 376859 28 ../bin/qnxip.bin 10o CONDVAR 8223e8c
> 376859 29 ../bin/qnxip.bin 10o CONDVAR 8223944
> 376859 30 ../bin/qnxip.bin 10o CONDVAR 8216ae4
> 376859 31 ../bin/qnxip.bin 10o CONDVAR 8216ef4
> 376859 32 ../bin/qnxip.bin 10o CONDVAR 8228124
> 376859 33 ../bin/qnxip.bin 10o CONDVAR 8228464
> 376859 34 ../bin/qnxip.bin 10o CONDVAR 8235604
> 376859 36 ../bin/qnxip.bin 30o CONDVAR 81ec8dc
> 376859 37 ../bin/qnxip.bin 10o CONDVAR 8228bb4
> 376859 40 ../bin/qnxip.bin 10o CONDVAR 8239bb4
> 376859 41 ../bin/qnxip.bin 30o CONDVAR 8139dbc
> 376859 44 ../bin/qnxip.bin 10o REPLY 376881
> 376859 45 ../bin/qnxip.bin 30o CONDVAR 81bab4c
> 376859 46 ../bin/qnxip.bin 10o REPLY 376852
> 376859 47 ../bin/qnxip.bin 10o REPLY 376887
> 376859 48 ../bin/qnxip.bin 10o REPLY 376884
> 376859 50 ../bin/qnxip.bin 10o REPLY 376890
> 376859 53 ../bin/qnxip.bin 10o CONDVAR 82639ac
> 376859 54 ../bin/qnxip.bin 10o CONDVAR 82666d4
> 376859 55 ../bin/qnxip.bin 30o CONDVAR 81ade8c
> 376859 56...
View Full Message
|
|
|
Oleh Derevenko(deleted)
|
Re: TCP stream socket send() thread safety
|
Oleh Derevenko(deleted)
12/05/2007 10:02 AM
post3352
|
Re: TCP stream socket send() thread safety
Hi
> You're close to the default 'threads_max' value on one node
> but they don't seem to all be doing socket operations. You
> can check the sloginfo output to be sure but it doesn't look
> like you're hitting this limit.
>
> You can try exercising subsystems: does localhost work, can
> you ping offnode, does qnet offnode work?
Well, I can't really know the moment when the problem appears. I only see the consequences: the client software
disconnects because of communication timeout. But after it disconnects it is late to check anything already. Anyway, as
I already told my SSH terminal is quite fine and it looks like I can access network nodes (at least shell script run
from network node does not terminate).
> Anything
> different in the 'netstat -s' or 'cat /proc/qnetstats'
> output from when it works vs when you're in failure mode?
So, I made an experiment. I ran the following script
=== begin ===
out="stats.out"
while true;
do
echo "=" >> $out
date >> $out
echo "=" >> $out
echo "1111111111111111111111111111111111111111111111" >> $out
echo "=" >> $out
pidin -p $1 >> $out
echo "=" >> $out
echo "2222222222222222222222222222222222222222222222" >> $out
echo "=" >> $out
netstat -s >> $out
echo "=" >> $out
echo "3333333333333333333333333333333333333333333333" >> $out
echo "=" >> $out
cat /proc/qnetstats >> $out
sleep 1
done;
=== end ===
It dumps thread states and IP+qnet statistics every second.
I'll send you an excerpt from the output that contains a single connect-disconnect period with next letter to your
personal e-mail.
You can know that client is connected if there are threads blocked over QNet on other nodes. When client disconnects,
all the threads are blocked locally (the first and few last passes).
Perhaps, it would be important to know that client is a Windows application and both sides call setsockopt(...,
IPPROTO_TCP, TCP_NODELAY, ...).
|
|
|
Sean Boudreau(deleted)
|
Re: TCP stream socket send() thread safety
|
Sean Boudreau(deleted)
12/05/2007 10:22 AM
post3362
|
Re: TCP stream socket send() thread safety
>
> It dumps thread states and IP+qnet statistics every second.
> I'll send you an excerpt from the output that contains a single
> connect-disconnect period with next letter to your personal e-mail.
Please use the group so more eyes can see it.
-seanb
|
|
|
Oleh Derevenko(deleted)
|
Re: TCP stream socket send() thread safety
|
Oleh Derevenko(deleted)
12/05/2007 10:31 AM
post3364
|
Re: TCP stream socket send() thread safety
> >
> > It dumps thread states and IP+qnet statistics every second.
> > I'll send you an excerpt from the output that contains a single
> > connect-disconnect period with next letter to your personal e-mail.
>
> Please use the group so more eyes can see it.
Could you forward the file to the people you think could help solving the problem? I'm concerned that publishing network
data in general access newsgroup could be a security threat.
|
|
|
Sean Boudreau(deleted)
|
Re: TCP stream socket send() thread safety
|
Sean Boudreau(deleted)
12/08/2007 8:32 AM
post3461
|
Re: TCP stream socket send() thread safety
After reviewing you logs there doesn't seem
to be anything all that abnormal therein. You might want to check if the following count correlates with
your issue:
4 connections dropped by rexmit timeout
At this point I'd try to instrument the app to see why connections are being dropped. Is it normal
termination? You seem to have multiple threads
sending on a socket; could you have multiple threads
closing a socket? Could socket creation be happening
between two closes?
-seanb
|
|
|
Oleh Derevenko(deleted)
|
Re: TCP stream socket send() thread safety
|
Oleh Derevenko(deleted)
12/08/2007 9:16 AM
post3462
|
Re: TCP stream socket send() thread safety
> 4 connections dropped by rexmit timeout
>
> At this point I'd try to instrument the app to see why connections are being dropped. Is it normal termination?
Well, I can't answer this question because I do not have any idea when these connections were dropped and even if they
have been dropped by my process.
> You seem to have multiple threads sending on a socket; could you have multiple threads closing a socket? Could socket
creation be happening between two closes?
So, the program works as follows:
1) There is a thread that listens for incoming connections and accepts them (Thread A).
2) For every accepted socket a new worker thread is created to serve that socket (Thread B).
3) Worker thread reads command envelopes from socket and for most types of commands command processing threads are
created (Thread C). The rest of commands are executed synchronously in Thread B before reading next command envelope (in
particular, heartbeats are answered synchronously).
4) Command processing thread gets as parameters command input data (read by Thread B before) and the socket (to be able
to send command status/response back to client).
So Thread C just gets input data and a socket handle. It does its job, writes result back to the socket and terminates.
Socket is closed by Thread B after client shuts the connection down or communication error is detected.
The answers for the questions are:
> Could you have multiple threads closing a socket?
No. Because only Thread B closes its personal socket it was created with and it does it only after connection
termination by client or communication error and after all the related command processing threads (Threads C) finish
sending their responses and terminate.
> Could socket creation be happening between two closes?
Well, if we consider new connection accept to be a socket creation then there can be anything: several of Thread B's can
be closing their sockets and Thread A can accept new connections at the same time.
|
|
|
Oleh Derevenko(deleted)
|
Re: TCP stream socket send() thread safety
|
Oleh Derevenko(deleted)
12/08/2007 9:29 AM
post3463
|
Re: TCP stream socket send() thread safety
> > 4 connections dropped by rexmit timeout
> >
> > At this point I'd try to instrument the app to see why connections are being
> dropped. Is it normal termination?
>
> Well, I can't answer this question because I do not have any idea when these
> connections were dropped and even if they have been dropped by my process.
I have checked full dump output and those "4 connections dropped" remain the same during all 11 minutes I was monitoring
the network.
|
|
|
|