BERGMANN Yannick
11/17/2010 3:42 AM
post74903
|
Periodically (once every 2-3 month), on many sites, we have problems with io-net under QNX 6.3.0 SP3 (x86). When this
happens, the system completely freeze, so we can found core dumps but they seem corrupted (coreinfo give results, but in
gdb, the stack trace is very long). We also tried to generate .kev files but the system have no time to write them on
disk.
After some research in our labs, we found that killing io-net with SIGKILL (slay -f -9 io-net) caused the system to
freeze, this is certainly due to a bypass of the cleaning process. If we look at the coreinfo output, we see that one
thread of io-net receive SIGSEGV, so now we would like to know how to prevent this fault.
We developed using the High Availability Manager a method for restarting io-net when it dies, and surprisingly, it
prevent the system to freeze when sending a SIGKILL to io-net. On site, however, it didn't worked: we waited 3 month for
the crash to happen and when it happened, the system was freezed.
At this point, we really don't know what to do. The problem is not reproducible (we never experienced it on our labs),
we don't have a lot of informations, except the io-net corrupted core dump that is placed in /var/dumps/.
We really don't know what to do to avoid the freeze of the system.
Any suggestions?
Here is the coreinfo output, and some infos about io-net's configuration:
>coreinfo io-net.core
io-net.core:
processor=X86 num_cpus=1
cpu 1 cpu=686 name=Intel 686 F6M13S8 speed=1798
flags=0xc0007fff FPU MMU CPUID RDTSC INVLPG WP BSWAP MMX CMOV PSE PGE MTRR SEP SIMD FXSR cyc/sec=1800291400 tod_adj=
1280846434000000000 nsec=7958021569124295 inc=99733
boot=1280846434 epoch=1970 intr=0
rate=838095345 scale=-15 load=119
MACHINE="x86pc" HOSTNAME="localhost"
pid=77841 parent=1 child=0 pgrp=77841 sid=1 flags=0x403210 umask=0 base_addr=0x8048000 init_stack=0x8047f00 ruid=0
euid=0 suid=0 rgid=0 egid=0 sgid=0 ign=0000000006800000 queue=ff00000000008000 pending=0000000000000000
fds=6 threads=8 timers=5 chans=38
thread 1
ip=0xb032e985 sp=0x8047dc8 stkbase=0x7fc7000 stksize=528384
state=SIGWAITINFO flags=80000000 last_cpu=1 timeout=00000000
pri=10 realpri=10 policy=OTHER
thread 2 SIGNALLED-SIGSEGV code=1 MAPERR refaddr=4 fltno=11
ip=0xb031f3d4 sp=0x7fc6e40 stkbase=0x7f73000 stksize=135168
state=STOPPED flags=84000000 last_cpu=1 timeout=00000000
pri=10 realpri=10 policy=OTHER
thread 3
ip=0xb032dd29 sp=0x7fb5f00 stkbase=0x7fa5000 stksize=69632
state=RECEIVE flags=84020000 last_cpu=1 timeout=00000000
pri=10 realpri=10 policy=OTHER
blocked_chid=1
thread 4
ip=0xb032dd29 sp=0x7fa4f00 stkbase=0x7f94000 stksize=69632
state=RECEIVE flags=84020000 last_cpu=1 timeout=00000000
pri=10 realpri=10 policy=OTHER
blocked_chid=1
thread 5
ip=0xb032dd29 sp=0x7f72f00 stkbase=0x7f62000 stksize=69632
state=RECEIVE flags=84020000 last_cpu=1 timeout=00000000
pri=10 realpri=10 policy=OTHER
blocked_chid=1
thread 8
ip=0xb032db45 sp=0x7f1ff70 stkbase=0x7eff000 stksize=135168
state=RECEIVE flags=84000000 last_cpu=1 timeout=00000000
pri=21 realpri=21 policy=OTHER
blocked_chid=24
thread 9
ip=0xb032db45 sp=0x7efef70 stkbase=0x7ede000 stksize=135168
state=STOPPED flags=84000000 last_cpu=1 timeout=00000000
pri=21 realpri=21 policy=OTHER
thread 10
ip=0xb032db45 sp=0x7eddf70 stkbase=0x7ebd000 stksize=135168
state=STOPPED flags=84000000 last_cpu=1 timeout=00000000
pri=21 realpri=21 policy=OTHER
>pidin -p io-net mem irq
pid tid name prio STATE code data stack
77841 1 sbin/io-net 10o SIGWAITINFO 64K 5172K 8192(516K)*
77841 2 sbin/io-net 10o RECEIVE 64K 5172K 4096(132K)
77841 3 sbin/io-net 10o RECEIVE 64K 5172K 24K(68K)
77841 4 sbin/io-net 10o RECEIVE 64K 5172K 4096(68K)
77841 5 sbin/io-net 10o RECEIVE ...
View Full Message
|
|
|
Hugh Brown
|
Re: Periodic io-net crash
|
Hugh Brown
11/22/2010 10:44 AM
post75515
|
Re: Periodic io-net crash
Unfortunately there is no useful information in the core file.
On 10-11-17 3:42 AM, "BERGMANN Yannick" <community-noreply@qnx.com> wrote:
> Periodically (once every 2-3 month), on many sites, we have problems with
> io-net under QNX 6.3.0 SP3 (x86). When this happens, the system completely
> freeze, so we can found core dumps but they seem corrupted (coreinfo give
> results, but in gdb, the stack trace is very long). We also tried to generate
> .kev files but the system have no time to write them on disk.
>
> After some research in our labs, we found that killing io-net with SIGKILL
> (slay -f -9 io-net) caused the system to freeze, this is certainly due to a
> bypass of the cleaning process. If we look at the coreinfo output, we see that
> one thread of io-net receive SIGSEGV, so now we would like to know how to
> prevent this fault.
>
> We developed using the High Availability Manager a method for restarting
> io-net when it dies, and surprisingly, it prevent the system to freeze when
> sending a SIGKILL to io-net. On site, however, it didn't worked: we waited 3
> month for the crash to happen and when it happened, the system was freezed.
>
> At this point, we really don't know what to do. The problem is not
> reproducible (we never experienced it on our labs), we don't have a lot of
> informations, except the io-net corrupted core dump that is placed in
> /var/dumps/.
> We really don't know what to do to avoid the freeze of the system.
>
> Any suggestions?
>
> Here is the coreinfo output, and some infos about io-net's configuration:
>
>> >coreinfo io-net.core
> io-net.core:
> processor=X86 num_cpus=1
> cpu 1 cpu=686 name=Intel 686 F6M13S8 speed=1798
> flags=0xc0007fff FPU MMU CPUID RDTSC INVLPG WP BSWAP MMX CMOV PSE PGE MTRR
> SEP SIMD FXSR cyc/sec=1800291400 tod_adj=1280846434000000000
> nsec=7958021569124295 inc=99733
> boot=1280846434 epoch=1970 intr=0
> rate=838095345 scale=-15 load=119
> MACHINE="x86pc" HOSTNAME="localhost"
> pid=77841 parent=1 child=0 pgrp=77841 sid=1 flags=0x403210 umask=0
> base_addr=0x8048000 init_stack=0x8047f00 ruid=0 euid=0 suid=0 rgid=0 egid=0
> sgid=0 ign=0000000006800000 queue=ff00000000008000 pending=0000000000000000
> fds=6 threads=8 timers=5 chans=38
> thread 1
> ip=0xb032e985 sp=0x8047dc8 stkbase=0x7fc7000 stksize=528384
> state=SIGWAITINFO flags=80000000 last_cpu=1 timeout=00000000
> pri=10 realpri=10 policy=OTHER
> thread 2 SIGNALLED-SIGSEGV code=1 MAPERR refaddr=4 fltno=11
> ip=0xb031f3d4 sp=0x7fc6e40 stkbase=0x7f73000 stksize=135168
> state=STOPPED flags=84000000 last_cpu=1 timeout=00000000
> pri=10 realpri=10 policy=OTHER
> thread 3
> ip=0xb032dd29 sp=0x7fb5f00 stkbase=0x7fa5000 stksize=69632
> state=RECEIVE flags=84020000 last_cpu=1 timeout=00000000
> pri=10 realpri=10 policy=OTHER
> blocked_chid=1
> thread 4
> ip=0xb032dd29 sp=0x7fa4f00 stkbase=0x7f94000 stksize=69632
> state=RECEIVE flags=84020000 last_cpu=1 timeout=00000000
> pri=10 realpri=10 policy=OTHER
> blocked_chid=1
> thread 5
> ip=0xb032dd29 sp=0x7f72f00 stkbase=0x7f62000 stksize=69632
> state=RECEIVE flags=84020000 last_cpu=1 timeout=00000000
> pri=10 realpri=10 policy=OTHER
> blocked_chid=1
> thread 8
> ip=0xb032db45 sp=0x7f1ff70 stkbase=0x7eff000 stksize=135168
> state=RECEIVE flags=84000000 last_cpu=1 timeout=00000000
> pri=21 realpri=21 policy=OTHER
> blocked_chid=24
> thread 9
> ip=0xb032db45 sp=0x7efef70 stkbase=0x7ede000 stksize=135168
> state=STOPPED flags=84000000 last_cpu=1 timeout=00000000
> pri=21 realpri=21 policy=OTHER
> thread 10
> ip=0xb032db45 sp=0x7eddf70 stkbase=0x7ebd000 stksize=135168
> ...
View Full Message
|
|
|
Jason Johnson
|
Re: Periodic io-net crash
|
Jason Johnson
01/05/2011 9:42 AM
post81184
|
Re: Periodic io-net crash
I'm curious, did you ever solve this? We are seen something similar... Happens too infrequently to gleam much useful
data.
|
|
|
BERGMANN Yannick
|
Re: Periodic io-net crash
|
BERGMANN Yannick
01/07/2011 11:05 AM
post81518
|
Re: Periodic io-net crash
Hi Jason,
yes this is still a "big" issue for us at the moment.
Can you please give more details about your problem?
- What kind of hardware do you use ?
- What network driver do you use ?
- What is the exact QNX version you are using (we use QNX630 SP3) ?
- Frequency of the problem ?
We have this problem for more then 2 years now and QNX support is not able to help us ... :-(
Thanks for your feed-back.
Yannick
|
|
|
Jason Johnson
|
Re: Periodic io-net crash
|
Jason Johnson
01/07/2011 11:47 AM
post81526
|
Re: Periodic io-net crash
Hi Yannick,
I gather from QNX that you and I are the only ones seeing this, and I imagine you have the same issue we have with not
being able to reproduce it. Perhaps we should join forces on this!
We are also using QNX 6.3.0 SP3, no small coincidence there I suspect.
We are on a x86 platform, can't recall the exact processor at this second but can dig it up.
The network card is an Intel Pro 100M and we are using devn_speedo.so with that. I have another thread: http://
community.qnx.com/sf/discussion/do/listPosts/projects.networking/discussion.drivers.topc19046 discussing if we need/
should apply a patch to that. I have asked QNX if that version could cause what we are seeing and am waiting for an
answer on that.
Frequency of the problem is still being determined: we have >12 systems with this configuration for one customer and it
has occurred about 8-9 times over the last few months. Some sites (at least 1) it has happened twice and obviously some
sites not at all (to the best of our knowledge). There is speculation that it has happened to a different customer
(with roughly the same configuration) once, but that is being looked into. So when it happens it seems to be about once
a month or so…
I’d be curious to learn the same information from you about your configuration.
Thanks!
Jason
|
|
|
BERGMANN Yannick
|
Re: Periodic io-net crash
|
BERGMANN Yannick
01/10/2011 10:53 AM
post81703
|
Re: Periodic io-net crash
> Hi Yannick,
>
> I gather from QNX that you and I are the only ones seeing this, and I imagine
> you have the same issue we have with not being able to reproduce it. Perhaps
> we should join forces on this!
>
> We are also using QNX 6.3.0 SP3, no small coincidence there I suspect.
>
> We are on a x86 platform, can't recall the exact processor at this second but
> can dig it up.
>
> The network card is an Intel Pro 100M and we are using devn_speedo.so with
> that. I have another thread: http://community.qnx.com/sf/discussion/do/
> listPosts/projects.networking/discussion.drivers.topc19046 discussing if we
> need/should apply a patch to that. I have asked QNX if that version could
> cause what we are seeing and am waiting for an answer on that.
>
> Frequency of the problem is still being determined: we have >12 systems with
> this configuration for one customer and it has occurred about 8-9 times over
> the last few months. Some sites (at least 1) it has happened twice and
> obviously some sites not at all (to the best of our knowledge). There is
> speculation that it has happened to a different customer (with roughly the
> same configuration) once, but that is being looked into. So when it happens
> it seems to be about once a month or so…
>
> I’d be curious to learn the same information from you about your
> configuration.
>
> Thanks!
>
> Jason
Jason,
We first used ethernet cards that used the speedo driver (82801DB Ethernet Controller, vendor ID: 8086h, device ID:
103ah).
We were convinced that it was the speedo driver that caused the freeze of the PC, so we changed all our network cards to
use the i82544 driver (82541EI Gigabit Ethernet Controller, vendorID: 8086h, deviceID: 1078h). Nothing changed, so it's
probably not caused by speedo.
In our labs, we could reproduce a freeze by doing a brutal 'slay -9 io-net'. Does this works for you?
The patch you mentioned (patch ID 685) add a TX_FLUSH command for the speedo driver.
After reading that, we recall that we use the TCP_NODELAY flag with the setsockopt() function in some of our projects,
and using it or not seems to correlate with the crash of io-net. Do you use this flag too?
The support told us that it was possible to use io-pkt on QNX 6.3.0 SP3 instead of io-net. It could be a solution, but
we have the intuition that it's a difficult thing to do (don't even know where to get the binaries).
Hope this will help...
|
|
|
Jef Hu
|
Re: Periodic io-net crash
|
Jef Hu
01/10/2011 7:48 PM
post81812
|
Re: Periodic io-net crash
Hi Jason/Yannick,
We've had this similar issue for almost a year now,
however our deployments were on QNX 6.3.2
We were advised to move on to QNX 6.4.1 and then 6.5.0 to see if it eliminated the issue.
Fortunately/unfortunately we have not been seeing the io-net issue,
but are now fighting different kind of fires as it seems to be causing a different kind of network behaviour where we
lose network connectivity but at least the system is still responsive.
The worst thing about this is trying to reproduce it... :(
Anyway, you are not alone...
Cheers,
Jef
|
|
|
Armin Steinhoff
|
Re: Periodic io-net crash
|
Armin Steinhoff
01/08/2011 4:45 AM
post81599
|
Re: Periodic io-net crash
Hi Yannick,
BERGMANN Yannick wrote:
> Hi Jason,
>
> yes this is still a "big" issue for us at the moment.
>
> Can you please give more details about your problem?
> - What kind of hardware do you use ?
> - What network driver do you use ?
> - What is the exact QNX version you are using (we use QNX630 SP3) ?
> - Frequency of the problem ?
>
> We have this problem for more then 2 years now and QNX support is not able to help us ... :-(
AFAIK from the German QSS branch office... there exist a patch for
solving that problem.
But .... you have to buy that patch!
--Armin
> Thanks for your feed-back.
>
> Yannick
>
>
>
>
> _______________________________________________
>
> Networking Drivers
> http://community.qnx.com/sf/go/post81518
>
>
|
|
|
BERGMANN Yannick
|
Re: Periodic io-net crash
|
BERGMANN Yannick
01/10/2011 10:54 AM
post81704
|
Re: Periodic io-net crash
>
>
> AFAIK from the German QSS branch office... there exist a patch for
> solving that problem.
> But .... you have to buy that patch!
>
Armin,
Can you tell me more about this patch???
|
|
|
Armin Steinhoff
|
Re: Periodic io-net crash
|
Armin Steinhoff
01/11/2011 6:54 AM
post81837
|
Re: Periodic io-net crash
BERGMANN Yannick wrote:
>>
>> AFAIK from the German QSS branch office... there exist a patch for
>> solving that problem.
>> But .... you have to buy that patch!
>>
> Armin,
>
> Can you tell me more about this patch???
No ... our customer didn't buy that patch. He switched to QNX 6.4.1 ...
--Armin
PS: I got the same problem on a industrial fair after some hours of
demonstrations ...
>
>
>
>
> _______________________________________________
>
> Networking Drivers
> http://community.qnx.com/sf/go/post81704
>
>
|
|
|
Hardik Baldaniya(deleted)
|
Re: Periodic io-net crash
|
Hardik Baldaniya(deleted)
03/13/2023 9:24 AM
post122164
|
Re: Periodic io-net crash
We are facing the same issue with QNX 6.3.2 and devn-speedo driver for Intel 82551T Ethernet Controller.
Sometimes it is causing system to freeze and sometimes, only Network becomes unresponsive while CPU is still running
other processes.
Could you solve this?
|
|
|
|