Oleh Derevenko(deleted)
|
A bug with MsgReadv in 6.3.0 SP2 kernel???
|
Oleh Derevenko(deleted)
02/05/2009 12:22 PM
post21493
|
A bug with MsgReadv in 6.3.0 SP2 kernel???
Hi All,
I have a process locked up in call to Msgreadv() while manual states that this function never blocks.
(gdb) she pidin -p 12898399
pid tid name prio STATE Blocked
12898399 1 ../bin/drm.bin 10o MUTEX 12898399-04 #1
12898399 2 ../bin/drm.bin 10o CONDVAR 816b89c
12898399 3 ../bin/drm.bin 10o RECEIVE 2
12898399 4 ../bin/drm.bin 10o RECEIVE 30
12898399 5 ../bin/drm.bin 10o RECEIVE 18
12898399 6 ../bin/drm.bin 10o RECEIVE 18
12898399 7 ../bin/drm.bin 10o RECEIVE 22
12898399 8 ../bin/drm.bin 10o RECEIVE 22
12898399 9 ../bin/drm.bin 10o RECEIVE 26
12898399 10 ../bin/drm.bin 10o RECEIVE 26
12898399 11 ../bin/drm.bin 10o RECEIVE 30
12898399 12 ../bin/drm.bin 10o RECEIVE 87
12898399 13 ../bin/drm.bin 10o RECEIVE 87
12898399 14 ../bin/drm.bin 10o RECEIVE 99
12898399 15 ../bin/drm.bin 10o RECEIVE 99
12898399 16 ../bin/drm.bin 10o RECEIVE 112
12898399 17 ../bin/drm.bin 10o RECEIVE 112
12898399 18 ../bin/drm.bin 10o RECEIVE 121
12898399 19 ../bin/drm.bin 10o RECEIVE 121
12898399 20 ../bin/drm.bin 10o RECEIVE 30
12898399 21 ../bin/drm.bin 10o RECEIVE 128
12898399 22 ../bin/drm.bin 10o RECEIVE 38
12898399 23 ../bin/drm.bin 10o RECEIVE 128
12898399 24 ../bin/drm.bin 10o RECEIVE 51
12898399 25 ../bin/drm.bin 10o RECEIVE 136
12898399 26 ../bin/drm.bin 16o RECEIVE 136
12898399 27 ../bin/drm.bin 10o RECEIVE 143
12898399 28 ../bin/drm.bin 10o RECEIVE 30
12898399 29 ../bin/drm.bin 10o RECEIVE 143
12898399 30 ../bin/drm.bin 10o RECEIVE 149
12898399 31 ../bin/drm.bin 10o RECEIVE 18
12898399 32 ../bin/drm.bin 10o RECEIVE 63
12898399 33 ../bin/drm.bin 10o RECEIVE 149
12898399 34 ../bin/drm.bin 10o RECEIVE 154
12898399 35 ../bin/drm.bin 10o RECEIVE 63
12898399 36 ../bin/drm.bin 10o RECEIVE 76
12898399 37 ../bin/drm.bin 10o RECEIVE 154
12898399 38 ../bin/drm.bin 10o RECEIVE 160
12898399 39 ../bin/drm.bin 10o RECEIVE 26
12898399 40 ../bin/drm.bin 10o RECEIVE 76
12898399 41 ../bin/drm.bin 10o RECEIVE 160
12898399 42 ../bin/drm.bin 10o RECEIVE 38
12898399 43 ../bin/drm.bin 10o RECEIVE 165
12898399 44 ../bin/drm.bin 10o RECEIVE 165
12898399 45 ../bin/drm.bin 10o RECEIVE 51
12898399 46 ../bin/drm.bin 10o RECEIVE 170
12898399 47 ../bin/drm.bin 10o RECEIVE 170
12898399 48 ../bin/drm.bin 10o RECEIVE 175
12898399 49 ../bin/drm.bin 10o RECEIVE 175
12898399 50 ../bin/drm.bin 10o RECEIVE 180
12898399 51 ../bin/drm.bin 10o RECEIVE 180
12898399 52 ../bin/drm.bin 10o RECEIVE 30
12898399 53 ../bin/drm.bin 10o RECEIVE 30
12898399 54 ../bin/drm.bin 12o MUTEX 12898399-64 #1
12898399 55 ../bin/drm.bin 12o MUTEX 12898399-64 #1
12898399 56 ../bin/drm.bin 10o RECEIVE 190
12898399 57 ../bin/drm.bin 10o RECEIVE 190
12898399 58 ../bin/drm.bin 10o RECEIVE 195
12898399 59 ../bin/drm.bin 10o RECEIVE 195
12898399 60 ../bin/drm.bin 10o RECEIVE 200
12898399 61 ../bin/drm.bin 10o RECEIVE 200
12898399 62 ../bin/drm.bin 12o MUTEX 12898399-64 #1
12898399 63 ../bin/drm.bin 10o RECEIVE 195
12898399 64 ../bin/drm.bin 16o REPLY 12898399
12898399 65 ../bin/drm.bin 16o MUTEX 12898399-64 #1
12898399 66 ../bin/drm.bin 10o RECEIVE 18
12898399 67 ../bin/drm.bin 12o MUTEX 12898399-64 #1
12898399 68 ../bin/drm.bin 10o RECEIVE 18
12898399 69 ../bin/drm.bin 16o RECEIVE 136
12898399 70...
View Full Message
|
|
|
Colin Burgess(deleted)
|
Re: A bug with MsgReadv in 6.3.0 SP2 kernel???
|
Colin Burgess(deleted)
02/05/2009 1:00 PM
post21496
|
Re: A bug with MsgReadv in 6.3.0 SP2 kernel???
I think you are confusing gdb thread id with kernel thread id, perhaps?
Oleh Derevenko wrote:
> Hi All,
>
> I have a process locked up in call to Msgreadv() while manual states that this function never blocks.
>
> (gdb) she pidin -p 12898399
> pid tid name prio STATE Blocked
> 12898399 1 ../bin/drm.bin 10o MUTEX 12898399-04 #1
> 12898399 2 ../bin/drm.bin 10o CONDVAR 816b89c
> 12898399 3 ../bin/drm.bin 10o RECEIVE 2
> 12898399 4 ../bin/drm.bin 10o RECEIVE 30
> 12898399 5 ../bin/drm.bin 10o RECEIVE 18
> 12898399 6 ../bin/drm.bin 10o RECEIVE 18
> 12898399 7 ../bin/drm.bin 10o RECEIVE 22
> 12898399 8 ../bin/drm.bin 10o RECEIVE 22
> 12898399 9 ../bin/drm.bin 10o RECEIVE 26
> 12898399 10 ../bin/drm.bin 10o RECEIVE 26
> 12898399 11 ../bin/drm.bin 10o RECEIVE 30
> 12898399 12 ../bin/drm.bin 10o RECEIVE 87
> 12898399 13 ../bin/drm.bin 10o RECEIVE 87
> 12898399 14 ../bin/drm.bin 10o RECEIVE 99
> 12898399 15 ../bin/drm.bin 10o RECEIVE 99
> 12898399 16 ../bin/drm.bin 10o RECEIVE 112
> 12898399 17 ../bin/drm.bin 10o RECEIVE 112
> 12898399 18 ../bin/drm.bin 10o RECEIVE 121
> 12898399 19 ../bin/drm.bin 10o RECEIVE 121
> 12898399 20 ../bin/drm.bin 10o RECEIVE 30
> 12898399 21 ../bin/drm.bin 10o RECEIVE 128
> 12898399 22 ../bin/drm.bin 10o RECEIVE 38
> 12898399 23 ../bin/drm.bin 10o RECEIVE 128
> 12898399 24 ../bin/drm.bin 10o RECEIVE 51
> 12898399 25 ../bin/drm.bin 10o RECEIVE 136
> 12898399 26 ../bin/drm.bin 16o RECEIVE 136
> 12898399 27 ../bin/drm.bin 10o RECEIVE 143
> 12898399 28 ../bin/drm.bin 10o RECEIVE 30
> 12898399 29 ../bin/drm.bin 10o RECEIVE 143
> 12898399 30 ../bin/drm.bin 10o RECEIVE 149
> 12898399 31 ../bin/drm.bin 10o RECEIVE 18
> 12898399 32 ../bin/drm.bin 10o RECEIVE 63
> 12898399 33 ../bin/drm.bin 10o RECEIVE 149
> 12898399 34 ../bin/drm.bin 10o RECEIVE 154
> 12898399 35 ../bin/drm.bin 10o RECEIVE 63
> 12898399 36 ../bin/drm.bin 10o RECEIVE 76
> 12898399 37 ../bin/drm.bin 10o RECEIVE 154
> 12898399 38 ../bin/drm.bin 10o RECEIVE 160
> 12898399 39 ../bin/drm.bin 10o RECEIVE 26
> 12898399 40 ../bin/drm.bin 10o RECEIVE 76
> 12898399 41 ../bin/drm.bin 10o RECEIVE 160
> 12898399 42 ../bin/drm.bin 10o RECEIVE 38
> 12898399 43 ../bin/drm.bin 10o RECEIVE 165
> 12898399 44 ../bin/drm.bin 10o RECEIVE 165
> 12898399 45 ../bin/drm.bin 10o RECEIVE 51
> 12898399 46 ../bin/drm.bin 10o RECEIVE 170
> 12898399 47 ../bin/drm.bin 10o RECEIVE 170
> 12898399 48 ../bin/drm.bin 10o RECEIVE 175
> 12898399 49 ../bin/drm.bin 10o RECEIVE 175
> 12898399 50 ../bin/drm.bin 10o RECEIVE 180
> 12898399 51 ../bin/drm.bin 10o RECEIVE 180
> 12898399 52 ../bin/drm.bin 10o RECEIVE 30
> 12898399 53 ../bin/drm.bin 10o RECEIVE 30
> 12898399 54 ../bin/drm.bin 12o MUTEX 12898399-64 #1
> 12898399 55 ../bin/drm.bin 12o MUTEX 12898399-64 #1
> 12898399 56 ../bin/drm.bin 10o RECEIVE 190
> 12898399 57 ../bin/drm.bin 10o RECEIVE 190
> 12898399 58 ../bin/drm.bin 10o RECEIVE 195
> 12898399 59 ../bin/drm.bin 10o RECEIVE 195
> 12898399 60 ../bin/drm.bin 10o RECEIVE 200
> 12898399 61 ../bin/drm.bin 10o RECEIVE 200
> 12898399 62 ../bin/drm.bin 12o MUTEX ...
View Full Message
|
|
|
Oleh Derevenko(deleted)
|
Re: A bug with MsgReadv in 6.3.0 SP2 kernel???
|
Oleh Derevenko(deleted)
02/05/2009 1:53 PM
post21504
|
Re: A bug with MsgReadv in 6.3.0 SP2 kernel???
No, GDB thread 64 is also kernel thread 64
(gdb) p {CUNIXThreadDescriptor}m_pHostThread
$5 = {m_pThreadID = 0x40, m_lRefCount = 2, m_ai_RunningMutexStorage = {65538, 64, -2147483647, 1348403264}}
Here m_pThreadID = 0x40 is the thread ID returned by pthread_self() when a pool thread starts executing.
|
|
|
Xiaodan Tang(deleted)
|
RE: A bug with MsgReadv in 6.3.0 SP2 kernel???
|
Xiaodan Tang(deleted)
02/05/2009 2:48 PM
post21523
|
RE: A bug with MsgReadv in 6.3.0 SP2 kernel???
Reply Block "on itself", is usually a sign that a server thread
MsgRead() a client message from remote node.
Is this the case?
-xtang
> -----Original Message-----
> From: Oleh Derevenko [mailto:community-noreply@qnx.com]
> Sent: Thursday, February 05, 2009 1:53 PM
> To: ostech-core_os
> Subject: Re: A bug with MsgReadv in 6.3.0 SP2 kernel???
>
> No, GDB thread 64 is also kernel thread 64
>
> (gdb) p {CUNIXThreadDescriptor}m_pHostThread
> $5 = {m_pThreadID = 0x40, m_lRefCount = 2,
> m_ai_RunningMutexStorage = {65538, 64, -2147483647, 1348403264}}
>
> Here m_pThreadID = 0x40 is the thread ID returned by
> pthread_self() when a pool thread starts executing.
>
> _______________________________________________
> OSTech
> http://community.qnx.com/sf/go/post21504
>
>
|
|
|
Colin Burgess(deleted)
|
Re: A bug with MsgReadv in 6.3.0 SP2 kernel???
|
Colin Burgess(deleted)
02/05/2009 3:55 PM
post21536
|
Re: A bug with MsgReadv in 6.3.0 SP2 kernel???
Looking at the code that seems to be the case (ker_msg_readv -> lookup_rcvid -> net_send2)
but why is pidin not reporting the pid@<node> information?
Xiaodan Tang wrote:
> Reply Block "on itself", is usually a sign that a server thread
> MsgRead() a client message from remote node.
> Is this the case?
>
> -xtang
>
>> -----Original Message-----
>> From: Oleh Derevenko [mailto:community-noreply@qnx.com]
>> Sent: Thursday, February 05, 2009 1:53 PM
>> To: ostech-core_os
>> Subject: Re: A bug with MsgReadv in 6.3.0 SP2 kernel???
>>
>> No, GDB thread 64 is also kernel thread 64
>>
>> (gdb) p {CUNIXThreadDescriptor}m_pHostThread
>> $5 = {m_pThreadID = 0x40, m_lRefCount = 2,
>> m_ai_RunningMutexStorage = {65538, 64, -2147483647, 1348403264}}
>>
>> Here m_pThreadID = 0x40 is the thread ID returned by
>> pthread_self() when a pool thread starts executing.
>>
>> _______________________________________________
>> OSTech
>> http://community.qnx.com/sf/go/post21504
>>
>>
>
> _______________________________________________
> OSTech
> http://community.qnx.com/sf/go/post21523
>
--
cburgess@qnx.com
|
|
|
Xiaodan Tang(deleted)
|
RE: A bug with MsgReadv in 6.3.0 SP2 kernel???
|
Xiaodan Tang(deleted)
02/05/2009 4:03 PM
post21537
|
RE: A bug with MsgReadv in 6.3.0 SP2 kernel???
On client side, if you MsgSend() cross QNET, pidin show you block on
server_pid@server.
On server side, it is qnet (vthread) sending to real server (NETCON);
when the server decided to "MsgRead()", it is blocked on the same
connection, hance the pidin report it block on itself (cop->chn->proc).
-xtang
> -----Original Message-----
> From: Colin Burgess [mailto:community-noreply@qnx.com]
> Sent: Thursday, February 05, 2009 3:56 PM
> To: ostech-core_os
> Subject: Re: A bug with MsgReadv in 6.3.0 SP2 kernel???
>
> Looking at the code that seems to be the case (ker_msg_readv
> -> lookup_rcvid -> net_send2) but why is pidin not reporting
> the pid@<node> information?
>
> Xiaodan Tang wrote:
> > Reply Block "on itself", is usually a sign that a server thread
> > MsgRead() a client message from remote node.
> > Is this the case?
> >
> > -xtang
> >
> >> -----Original Message-----
> >> From: Oleh Derevenko [mailto:community-noreply@qnx.com]
> >> Sent: Thursday, February 05, 2009 1:53 PM
> >> To: ostech-core_os
> >> Subject: Re: A bug with MsgReadv in 6.3.0 SP2 kernel???
> >>
> >> No, GDB thread 64 is also kernel thread 64
> >>
> >> (gdb) p {CUNIXThreadDescriptor}m_pHostThread
> >> $5 = {m_pThreadID = 0x40, m_lRefCount = 2,
> m_ai_RunningMutexStorage =
> >> {65538, 64, -2147483647, 1348403264}}
> >>
> >> Here m_pThreadID = 0x40 is the thread ID returned by
> >> pthread_self() when a pool thread starts executing.
> >>
> >> _______________________________________________
> >> OSTech
> >> http://community.qnx.com/sf/go/post21504
> >>
> >>
> >
> > _______________________________________________
> > OSTech
> > http://community.qnx.com/sf/go/post21523
> >
>
> --
> cburgess@qnx.com
>
> _______________________________________________
> OSTech
> http://community.qnx.com/sf/go/post21536
>
>
|
|
|
Colin Burgess(deleted)
|
Re: A bug with MsgReadv in 6.3.0 SP2 kernel???
|
Colin Burgess(deleted)
02/05/2009 4:32 PM
post21543
|
Re: A bug with MsgReadv in 6.3.0 SP2 kernel???
(qnet makes my brain squishy)
Can pidin detect this and print something more meaningful?
Xiaodan Tang wrote:
> On client side, if you MsgSend() cross QNET, pidin show you block on
> server_pid@server.
>
> On server side, it is qnet (vthread) sending to real server (NETCON);
> when the server decided to "MsgRead()", it is blocked on the same
> connection, hance the pidin report it block on itself (cop->chn->proc).
>
> -xtang
>
>
>> -----Original Message-----
>> From: Colin Burgess [mailto:community-noreply@qnx.com]
>> Sent: Thursday, February 05, 2009 3:56 PM
>> To: ostech-core_os
>> Subject: Re: A bug with MsgReadv in 6.3.0 SP2 kernel???
>>
>> Looking at the code that seems to be the case (ker_msg_readv
>> -> lookup_rcvid -> net_send2) but why is pidin not reporting
>> the pid@<node> information?
>>
>> Xiaodan Tang wrote:
>>> Reply Block "on itself", is usually a sign that a server thread
>>> MsgRead() a client message from remote node.
>>> Is this the case?
>>>
>>> -xtang
>>>
>>>> -----Original Message-----
>>>> From: Oleh Derevenko [mailto:community-noreply@qnx.com]
>>>> Sent: Thursday, February 05, 2009 1:53 PM
>>>> To: ostech-core_os
>>>> Subject: Re: A bug with MsgReadv in 6.3.0 SP2 kernel???
>>>>
>>>> No, GDB thread 64 is also kernel thread 64
>>>>
>>>> (gdb) p {CUNIXThreadDescriptor}m_pHostThread
>>>> $5 = {m_pThreadID = 0x40, m_lRefCount = 2,
>> m_ai_RunningMutexStorage =
>>>> {65538, 64, -2147483647, 1348403264}}
>>>>
>>>> Here m_pThreadID = 0x40 is the thread ID returned by
>>>> pthread_self() when a pool thread starts executing.
>>>>
>>>> _______________________________________________
>>>> OSTech
>>>> http://community.qnx.com/sf/go/post21504
>>>>
>>>>
>>> _______________________________________________
>>> OSTech
>>> http://community.qnx.com/sf/go/post21523
>>>
>> --
>> cburgess@qnx.com
>>
>> _______________________________________________
>> OSTech
>> http://community.qnx.com/sf/go/post21536
>>
>>
>
> _______________________________________________
> OSTech
> http://community.qnx.com/sf/go/post21537
>
--
cburgess@qnx.com
|
|
|
Oleh Derevenko(deleted)
|
Re: RE: A bug with MsgReadv in 6.3.0 SP2 kernel???
|
Oleh Derevenko(deleted)
02/05/2009 4:17 PM
post21541
|
Re: RE: A bug with MsgReadv in 6.3.0 SP2 kernel???
Yes, the request has come from the another node.
(gdb) fr 6
#6 0x080ad8b1 in CResourceManager::ProcessRequest (this=0x83dca28, pdcRequest=0x8425288) at /home/masha/rel6776/Src/
shared/common/rm.cpp:774
774 dispatch_handler(pdcRequest);
(gdb) p pdcRequest[0]
$6 = {resmgr_context = {rcvid = 8650979, info = {nd = 7, srcnd = 155, pid = 14934133, tid = 24, chid = 185, scoid =
1073742051, coid = 12, msglen = 16,
srcmsglen = 16, dstmsglen = 2147483647, priority = 12, flags = 256, reserved = 0}, msg = 0x84252e4, dpp =
0x8347888, id = -1, tid = 0, msg_max_size = 4104,
status = 0, offset = 0, size = 4, iov = {{iov_base = 0x84252e4, iov_len = 60}}}, message_context = {rcvid = 8650979,
info = {nd = 7, srcnd = 155,
pid = 14934133, tid = 24, chid = 185, scoid = 1073742051, coid = 12, msglen = 16, srcmsglen = 16, dstmsglen =
2147483647, priority = 12, flags = 256,
reserved = 0}, msg = 0x84252e4, dpp = 0x8347888, id = -1, tid = 0, msg_max_size = 4104, status = 0, offset = 0,
size = 4, iov = {{iov_base = 0x84252e4,
iov_len = 60}}}, select_context = {rcvid = 8650979, info = {msginfo = {nd = 7, srcnd = 155, pid = 14934133, tid
= 24, chid = 185, scoid = 1073742051,
coid = 12, msglen = 16, srcmsglen = 16, dstmsglen = 2147483647, priority = 12, flags = 256, reserved = 0},
siginfo = {si_signo = 7, si_code = 155,
si_errno = 14934133, __data = {__pad = {24, 185, 1073742051, 12, 16, 16, 2147483647}, __proc = {__pid = 24,
__pdata = {__kill = {__uid = 185, __value = {
sival_int = 1073742051, sival_ptr = 0x400000e3}}, __chld = {__utime = 185, __status = 1073742051,
__stime = 12}}}, __fault = {__fltno = 24,
__fltip = 0xb9, __addr = 0x400000e3}}}}, msg = 0x84252e4, dpp = 0x8347888, fd = -1, tid = 0, reserved = 4104
, flags = 0, reserved2 = {0, 4}, iov = {{
iov_base = 0x84252e4, iov_len = 60}}}, sigwait_context = {signo = 8650979, info = {msginfo = {nd = 7, srcnd =
155, pid = 14934133, tid = 24, chid = 185,
scoid = 1073742051, coid = 12, msglen = 16, srcmsglen = 16, dstmsglen = 2147483647, priority = 12, flags = 256,
reserved = 0}, siginfo = {si_signo = 7,
si_code = 155, si_errno = 14934133, __data = {__pad = {24, 185, 1073742051, 12, 16, 16, 2147483647}, __proc =
{__pid = 24, __pdata = {__kill = {
__uid = 185, __value = {sival_int = 1073742051, sival_ptr = 0x400000e3}}, __chld = {__utime = 185,
__status = 1073742051, __stime = 12}}},
__fault = {__fltno = 24, __fltip = 0xb9, __addr = 0x400000e3}}}}, msg = 0x84252e4, dpp = 0x8347888, status = -
1, tid = 0, set = {bits = {4104, 0}},
reserved2 = {0, 4}, iov = {{iov_base = 0x84252e4, iov_len = 60}}}}
(gdb) x/16b pdcRequest[0].resmgr_context.msg
0x84252e4: 0x01 0x01 0x10 0x00 0x05 0x00 0x00 0x00
0x84252ec: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
(gdb)
Here is also the state of process 14934133 at remote node
# pidin -p 14934133
pid tid name prio STATE Blocked
14934133 1 ../bin/axis.bin 10o MUTEX 14934133-06 #1
14934133 2 ../bin/axis.bin 10o CONDVAR 83446b4
14934133 3 ../bin/axis.bin 10o CONDVAR 8326e98
14934133 4 ../bin/axis.bin 10o RECEIVE 2
14934133 5 ../bin/axis.bin 10o RECEIVE 5
14934133 6 ../bin/axis.bin 10o CONDVAR 8366d74
14934133 7 ../bin/axis.bin 10o CONDVAR 837df2c
14934133 8 ../bin/axis.bin 10o CONDVAR 837d0bc
14934133 9 ../bin/axis.bin 14o CONDVAR 837cdcc
14934133 10 ../bin/axis.bin 21o RECEIVE 21
14934133 11 ../bin/axis.bin 10o CONDVAR 837c66c
14934133 12 ../bin/axis.bin 10o CONDVAR 83612c4
14934133 13 ../bin/axis.bin 15o NANOSLEEP
14934133 14 ../bin/axis.bin 17o RECEIVE 25
14934133 15 ../bin/axis.bin 17o RECEIVE 25
14934133 16 ../bin/axis.bin 17o RECEIVE 25
14934133 17 ../bin/axis.bin...
View Full Message
|
|
|
Xiaodan Tang(deleted)
|
RE: RE: A bug with MsgReadv in 6.3.0 SP2 kernel???
|
Xiaodan Tang(deleted)
02/05/2009 4:32 PM
post21544
|
RE: RE: A bug with MsgReadv in 6.3.0 SP2 kernel???
The message looks like an _IO_READ (0x0101) with nbyte 5;
I am not sure why this would causing a MsgRead().
Being said that, blocking a MsgRead() forever is also not the right
thing to
happen...
-xtang
> -----Original Message-----
> From: Oleh Derevenko [mailto:community-noreply@qnx.com]
> Sent: Thursday, February 05, 2009 4:17 PM
> To: ostech-core_os
> Subject: Re: RE: A bug with MsgReadv in 6.3.0 SP2 kernel???
>
> Yes, the request has come from the another node.
>
> (gdb) fr 6
> #6 0x080ad8b1 in CResourceManager::ProcessRequest
> (this=0x83dca28, pdcRequest=0x8425288) at
> /home/masha/rel6776/Src/shared/common/rm.cpp:774
> 774 dispatch_handler(pdcRequest);
> (gdb) p pdcRequest[0]
> $6 = {resmgr_context = {rcvid = 8650979, info = {nd = 7,
> srcnd = 155, pid = 14934133, tid = 24, chid = 185, scoid =
> 1073742051, coid = 12, msglen = 16,
> srcmsglen = 16, dstmsglen = 2147483647, priority = 12,
> flags = 256, reserved = 0}, msg = 0x84252e4, dpp = 0x8347888,
> id = -1, tid = 0, msg_max_size = 4104,
> status = 0, offset = 0, size = 4, iov = {{iov_base =
> 0x84252e4, iov_len = 60}}}, message_context = {rcvid =
> 8650979, info = {nd = 7, srcnd = 155,
> pid = 14934133, tid = 24, chid = 185, scoid =
> 1073742051, coid = 12, msglen = 16, srcmsglen = 16, dstmsglen
> = 2147483647, priority = 12, flags = 256,
> reserved = 0}, msg = 0x84252e4, dpp = 0x8347888, id =
> -1, tid = 0, msg_max_size = 4104, status = 0, offset = 0,
> size = 4, iov = {{iov_base = 0x84252e4,
> iov_len = 60}}}, select_context = {rcvid = 8650979,
> info = {msginfo = {nd = 7, srcnd = 155, pid = 14934133, tid =
> 24, chid = 185, scoid = 1073742051,
> coid = 12, msglen = 16, srcmsglen = 16, dstmsglen =
> 2147483647, priority = 12, flags = 256, reserved = 0},
> siginfo = {si_signo = 7, si_code = 155,
> si_errno = 14934133, __data = {__pad = {24, 185,
> 1073742051, 12, 16, 16, 2147483647}, __proc = {__pid = 24,
> __pdata = {__kill = {__uid = 185, __value = {
> sival_int = 1073742051, sival_ptr =
> 0x400000e3}}, __chld = {__utime = 185, __status = 1073742051,
> __stime = 12}}}, __fault = {__fltno = 24,
> __fltip = 0xb9, __addr = 0x400000e3}}}}, msg =
> 0x84252e4, dpp = 0x8347888, fd = -1, tid = 0, reserved =
> 4104, flags = 0, reserved2 = {0, 4}, iov = {{
> iov_base = 0x84252e4, iov_len = 60}}},
> sigwait_context = {signo = 8650979, info = {msginfo = {nd =
> 7, srcnd = 155, pid = 14934133, tid = 24, chid = 185,
> scoid = 1073742051, coid = 12, msglen = 16, srcmsglen
> = 16, dstmsglen = 2147483647, priority = 12, flags = 256,
> reserved = 0}, siginfo = {si_signo = 7,
> si_code = 155, si_errno = 14934133, __data = {__pad =
> {24, 185, 1073742051, 12, 16, 16, 2147483647}, __proc =
> {__pid = 24, __pdata = {__kill = {
> __uid = 185, __value = {sival_int =
> 1073742051, sival_ptr = 0x400000e3}}, __chld = {__utime =
> 185, __status = 1073742051, __stime = 12}}},
> __fault = {__fltno = 24, __fltip = 0xb9, __addr =
> 0x400000e3}}}}, msg = 0x84252e4, dpp = 0x8347888, status =
> -1, tid = 0, set = {bits = {4104, 0}},
> reserved2 = {0, 4}, iov = {{iov_base = 0x84252e4, iov_len = 60}}}}
> (gdb) x/16b pdcRequest[0].resmgr_context.msg
> 0x84252e4: 0x01 0x01 0x10 0x00 0x05 0x00
> 0x00 0x00
> 0x84252ec: 0x00 0x00 0x00 0x00 0x00 0x00
> 0x00 0x00
> (gdb)
>
>
>
> Here is also the state of process 14934133 at remote node
>
> # pidin -p 14934133
> pid tid name prio STATE Blocked
> 14934133 1 ../bin/axis.bin 10o MUTEX ...
View Full Message
|
|
|
Colin Burgess(deleted)
|
Re: A bug with MsgReadv in 6.3.0 SP2 kernel???
|
Colin Burgess(deleted)
02/05/2009 4:35 PM
post21546
|
Re: A bug with MsgReadv in 6.3.0 SP2 kernel???
The MsgReadv is in _resmgr_unblock_handler.
The thread 24 on the client should still be REPLY blocked on the server at this point... I would think!
Xiaodan Tang wrote:
> The message looks like an _IO_READ (0x0101) with nbyte 5;
> I am not sure why this would causing a MsgRead().
>
> Being said that, blocking a MsgRead() forever is also not the right
> thing to
> happen...
>
> -xtang
>
>
>> -----Original Message-----
>> From: Oleh Derevenko [mailto:community-noreply@qnx.com]
>> Sent: Thursday, February 05, 2009 4:17 PM
>> To: ostech-core_os
>> Subject: Re: RE: A bug with MsgReadv in 6.3.0 SP2 kernel???
>>
>> Yes, the request has come from the another node.
>>
>> (gdb) fr 6
>> #6 0x080ad8b1 in CResourceManager::ProcessRequest
>> (this=0x83dca28, pdcRequest=0x8425288) at
>> /home/masha/rel6776/Src/shared/common/rm.cpp:774
>> 774 dispatch_handler(pdcRequest);
>> (gdb) p pdcRequest[0]
>> $6 = {resmgr_context = {rcvid = 8650979, info = {nd = 7,
>> srcnd = 155, pid = 14934133, tid = 24, chid = 185, scoid =
>> 1073742051, coid = 12, msglen = 16,
>> srcmsglen = 16, dstmsglen = 2147483647, priority = 12,
>> flags = 256, reserved = 0}, msg = 0x84252e4, dpp = 0x8347888,
>> id = -1, tid = 0, msg_max_size = 4104,
>> status = 0, offset = 0, size = 4, iov = {{iov_base =
>> 0x84252e4, iov_len = 60}}}, message_context = {rcvid =
>> 8650979, info = {nd = 7, srcnd = 155,
>> pid = 14934133, tid = 24, chid = 185, scoid =
>> 1073742051, coid = 12, msglen = 16, srcmsglen = 16, dstmsglen
>> = 2147483647, priority = 12, flags = 256,
>> reserved = 0}, msg = 0x84252e4, dpp = 0x8347888, id =
>> -1, tid = 0, msg_max_size = 4104, status = 0, offset = 0,
>> size = 4, iov = {{iov_base = 0x84252e4,
>> iov_len = 60}}}, select_context = {rcvid = 8650979,
>> info = {msginfo = {nd = 7, srcnd = 155, pid = 14934133, tid =
>> 24, chid = 185, scoid = 1073742051,
>> coid = 12, msglen = 16, srcmsglen = 16, dstmsglen =
>> 2147483647, priority = 12, flags = 256, reserved = 0},
>> siginfo = {si_signo = 7, si_code = 155,
>> si_errno = 14934133, __data = {__pad = {24, 185,
>> 1073742051, 12, 16, 16, 2147483647}, __proc = {__pid = 24,
>> __pdata = {__kill = {__uid = 185, __value = {
>> sival_int = 1073742051, sival_ptr =
>> 0x400000e3}}, __chld = {__utime = 185, __status = 1073742051,
>> __stime = 12}}}, __fault = {__fltno = 24,
>> __fltip = 0xb9, __addr = 0x400000e3}}}}, msg =
>> 0x84252e4, dpp = 0x8347888, fd = -1, tid = 0, reserved =
>> 4104, flags = 0, reserved2 = {0, 4}, iov = {{
>> iov_base = 0x84252e4, iov_len = 60}}},
>> sigwait_context = {signo = 8650979, info = {msginfo = {nd =
>> 7, srcnd = 155, pid = 14934133, tid = 24, chid = 185,
>> scoid = 1073742051, coid = 12, msglen = 16, srcmsglen
>> = 16, dstmsglen = 2147483647, priority = 12, flags = 256,
>> reserved = 0}, siginfo = {si_signo = 7,
>> si_code = 155, si_errno = 14934133, __data = {__pad =
>> {24, 185, 1073742051, 12, 16, 16, 2147483647}, __proc =
>> {__pid = 24, __pdata = {__kill = {
>> __uid = 185, __value = {sival_int =
>> 1073742051, sival_ptr = 0x400000e3}}, __chld = {__utime =
>> 185, __status = 1073742051, __stime = 12}}},
>> __fault = {__fltno = 24, __fltip = 0xb9, __addr =
>> 0x400000e3}}}}, msg = 0x84252e4, dpp = 0x8347888, status =
>> -1, tid = 0, set = {bits = {4104, 0}},
>> reserved2 = {0, 4}, iov = {{iov_base = 0x84252e4, iov_len = 60}}}}
>> (gdb) x/16b...
View Full Message
|
|
|
Oleh Derevenko(deleted)
|
Re: A bug with MsgReadv in 6.3.0 SP2 kernel???
|
Oleh Derevenko(deleted)
02/05/2009 4:52 PM
post21547
|
Re: A bug with MsgReadv in 6.3.0 SP2 kernel???
I'm not sure if this has anything to do with the situation and I don't know if the event I'm going to tell about took
place indeed but I have a mechanism of unblocking threads from requests to the server by sending SIGINT signal with
pthread_kill() to the thread internally in client. I know that this should send unblock pulse to the server and server
must unblock the request explicitly, but maybe the pulse has been lost somehow or it has not been processed or it has
been incorrectly processed while the request was entering dispatch or was just waiting on mutex before
dispatch_handler() for other thread to finish using the server objects.
> The MsgReadv is in _resmgr_unblock_handler.
>
> The thread 24 on the client should still be REPLY blocked on the server at
> this point... I would think!
>
> Xiaodan Tang wrote:
> > The message looks like an _IO_READ (0x0101) with nbyte 5;
> > I am not sure why this would causing a MsgRead().
> >
> > Being said that, blocking a MsgRead() forever is also not the right
> > thing to
> > happen...
> >
> > -xtang
|
|
|
Oleh Derevenko(deleted)
|
Re: A bug with MsgReadv in 6.3.0 SP2 kernel???
|
Oleh Derevenko(deleted)
02/05/2009 5:21 PM
post21550
|
Re: A bug with MsgReadv in 6.3.0 SP2 kernel???
And also, client node is 6.3.2 and does not have patch #0284 integrated into the kernel. Sorry, I've forgotten to
mention it.
> I'm not sure if this has anything to do with the situation and I don't know if
> the event I'm going to tell about took place indeed but I have a mechanism of
> unblocking threads from requests to the server by sending SIGINT signal with
> pthread_kill() to the thread internally in client. I know that this should
> send unblock pulse to the server and server must unblock the request
> explicitly, but maybe the pulse has been lost somehow or it has not been
> processed or it has been incorrectly processed while the request was entering
> dispatch or was just waiting on mutex before dispatch_handler() for other
> thread to finish using the server objects.
>
> > The MsgReadv is in _resmgr_unblock_handler.
> >
> > The thread 24 on the client should still be REPLY blocked on the server at
> > this point... I would think!
> >
> > Xiaodan Tang wrote:
> > > The message looks like an _IO_READ (0x0101) with nbyte 5;
> > > I am not sure why this would causing a MsgRead().
> > >
> > > Being said that, blocking a MsgRead() forever is also not the right
> > > thing to
> > > happen...
> > >
> > > -xtang
|
|
|
Oleh Derevenko(deleted)
|
Re: A bug with MsgReadv in 6.3.0 SP2 kernel???
|
Oleh Derevenko(deleted)
02/05/2009 6:52 PM
post21557
|
Re: A bug with MsgReadv in 6.3.0 SP2 kernel???
And I confirm that thread 24 on client was indeed calling read() with buffer of 5 bytes. I've analyzed the program logs
in memory and found the code the thread was executing. Unfortunately, the log level is not high enough and I can't see
what was the result of the operation directly. Most likely the read() has been aborted by closing the file handle from
other thread.
> And also, client node is 6.3.2 and does not have patch #0284 integrated into
> the kernel. Sorry, I've forgotten to mention it.
>
>
> > I'm not sure if this has anything to do with the situation and I don't know
> if
> > the event I'm going to tell about took place indeed but I have a mechanism
> of
> > unblocking threads from requests to the server by sending SIGINT signal
> with
> > pthread_kill() to the thread internally in client. I know that this should
> > send unblock pulse to the server and server must unblock the request
> > explicitly, but maybe the pulse has been lost somehow or it has not been
> > processed or it has been incorrectly processed while the request was
> entering
> > dispatch or was just waiting on mutex before dispatch_handler() for other
> > thread to finish using the server objects.
> >
> > > The MsgReadv is in _resmgr_unblock_handler.
> > >
> > > The thread 24 on the client should still be REPLY blocked on the server at
>
> > > this point... I would think!
> > >
> > > Xiaodan Tang wrote:
> > > > The message looks like an _IO_READ (0x0101) with nbyte 5;
> > > > I am not sure why this would causing a MsgRead().
> > > >
> > > > Being said that, blocking a MsgRead() forever is also not the right
> > > > thing to
> > > > happen...
> > > >
> > > > -xtang
>
>
|
|
|
|