Project Home
Project Home
Wiki
Wiki
Discussion Forums
Discussions
Project Information
Project Info
Forum Topic - Periodic io-net crash: (11 Items)
   
Periodic io-net crash  
Periodically (once every 2-3 month), on many sites, we have problems with io-net under QNX 6.3.0 SP3 (x86). When this 
happens, the system completely freeze, so we can found core dumps but they seem corrupted (coreinfo give results, but in
 gdb, the stack trace is very long). We also tried to generate .kev files but the system have no time to write them on 
disk.

After some research in our labs, we found that killing io-net with SIGKILL (slay -f -9 io-net) caused the system to 
freeze, this is certainly due to a bypass of the cleaning process. If we look at the coreinfo output, we see that one 
thread of io-net receive SIGSEGV, so now we would like to know how to prevent this fault.

We developed using the High Availability Manager a method for restarting io-net when it dies, and surprisingly, it 
prevent the system to freeze when sending a SIGKILL to io-net. On site, however, it didn't worked: we waited 3 month for
 the crash to happen and when it happened, the system was freezed.

At this point, we really don't know what to do. The problem is not reproducible (we never experienced it on our labs), 
we don't have a lot of informations, except the io-net corrupted core dump that is placed in /var/dumps/.
We really don't know what to do to avoid the freeze of the system.

Any suggestions?

Here is the coreinfo output, and some infos about io-net's configuration:

>coreinfo io-net.core
io-net.core:
 processor=X86 num_cpus=1
  cpu 1 cpu=686 name=Intel 686 F6M13S8 speed=1798
   flags=0xc0007fff FPU MMU CPUID RDTSC INVLPG WP BSWAP MMX CMOV PSE PGE MTRR SEP SIMD FXSR  cyc/sec=1800291400 tod_adj=
1280846434000000000 nsec=7958021569124295 inc=99733
 boot=1280846434 epoch=1970 intr=0
 rate=838095345 scale=-15 load=119
   MACHINE="x86pc" HOSTNAME="localhost"
 pid=77841 parent=1 child=0 pgrp=77841 sid=1  flags=0x403210 umask=0 base_addr=0x8048000 init_stack=0x8047f00  ruid=0 
euid=0 suid=0  rgid=0 egid=0 sgid=0  ign=0000000006800000 queue=ff00000000008000 pending=0000000000000000
 fds=6 threads=8 timers=5 chans=38
 thread 1
  ip=0xb032e985 sp=0x8047dc8 stkbase=0x7fc7000 stksize=528384
  state=SIGWAITINFO flags=80000000 last_cpu=1 timeout=00000000
  pri=10 realpri=10 policy=OTHER
 thread 2 SIGNALLED-SIGSEGV code=1 MAPERR refaddr=4 fltno=11
  ip=0xb031f3d4 sp=0x7fc6e40 stkbase=0x7f73000 stksize=135168
  state=STOPPED flags=84000000 last_cpu=1 timeout=00000000
  pri=10 realpri=10 policy=OTHER
 thread 3
  ip=0xb032dd29 sp=0x7fb5f00 stkbase=0x7fa5000 stksize=69632
  state=RECEIVE flags=84020000 last_cpu=1 timeout=00000000
  pri=10 realpri=10 policy=OTHER
  blocked_chid=1
 thread 4
  ip=0xb032dd29 sp=0x7fa4f00 stkbase=0x7f94000 stksize=69632
  state=RECEIVE flags=84020000 last_cpu=1 timeout=00000000
  pri=10 realpri=10 policy=OTHER
  blocked_chid=1
 thread 5
  ip=0xb032dd29 sp=0x7f72f00 stkbase=0x7f62000 stksize=69632
  state=RECEIVE flags=84020000 last_cpu=1 timeout=00000000
  pri=10 realpri=10 policy=OTHER
  blocked_chid=1
 thread 8
  ip=0xb032db45 sp=0x7f1ff70 stkbase=0x7eff000 stksize=135168
  state=RECEIVE flags=84000000 last_cpu=1 timeout=00000000
  pri=21 realpri=21 policy=OTHER
  blocked_chid=24
 thread 9
  ip=0xb032db45 sp=0x7efef70 stkbase=0x7ede000 stksize=135168
  state=STOPPED flags=84000000 last_cpu=1 timeout=00000000
  pri=21 realpri=21 policy=OTHER
 thread 10
  ip=0xb032db45 sp=0x7eddf70 stkbase=0x7ebd000 stksize=135168
  state=STOPPED flags=84000000 last_cpu=1 timeout=00000000
  pri=21 realpri=21 policy=OTHER


>pidin -p io-net mem irq
     pid tid name               prio STATE           code  data         stack
   77841   1 sbin/io-net         10o SIGWAITINFO       64K 5172K  8192(516K)*
   77841   2 sbin/io-net         10o RECEIVE           64K 5172K  4096(132K) 
   77841   3 sbin/io-net         10o RECEIVE           64K 5172K    24K(68K) 
   77841   4 sbin/io-net         10o RECEIVE           64K 5172K   4096(68K) 
   77841   5 sbin/io-net         10o RECEIVE  ...
View Full Message
Attachment: Text io-net.core.gz 562.82 KB
Re: Periodic io-net crash  
Unfortunately there is no useful information in the core file.


On 10-11-17 3:42 AM, "BERGMANN Yannick" <community-noreply@qnx.com> wrote:

> Periodically (once every 2-3 month), on many sites, we have problems with
> io-net under QNX 6.3.0 SP3 (x86). When this happens, the system completely
> freeze, so we can found core dumps but they seem corrupted (coreinfo give
> results, but in gdb, the stack trace is very long). We also tried to generate
> .kev files but the system have no time to write them on disk.
> 
> After some research in our labs, we found that killing io-net with SIGKILL
> (slay -f -9 io-net) caused the system to freeze, this is certainly due to a
> bypass of the cleaning process. If we look at the coreinfo output, we see that
> one thread of io-net receive SIGSEGV, so now we would like to know how to
> prevent this fault.
> 
> We developed using the High Availability Manager a method for restarting
> io-net when it dies, and surprisingly, it prevent the system to freeze when
> sending a SIGKILL to io-net. On site, however, it didn't worked: we waited 3
> month for the crash to happen and when it happened, the system was freezed.
> 
> At this point, we really don't know what to do. The problem is not
> reproducible (we never experienced it on our labs), we don't have a lot of
> informations, except the io-net corrupted core dump that is placed in
> /var/dumps/.
> We really don't know what to do to avoid the freeze of the system.
> 
> Any suggestions?
> 
> Here is the coreinfo output, and some infos about io-net's configuration:
> 
>> >coreinfo io-net.core
> io-net.core:
>  processor=X86 num_cpus=1
>   cpu 1 cpu=686 name=Intel 686 F6M13S8 speed=1798
>    flags=0xc0007fff FPU MMU CPUID RDTSC INVLPG WP BSWAP MMX CMOV PSE PGE MTRR
> SEP SIMD FXSR  cyc/sec=1800291400 tod_adj=1280846434000000000
> nsec=7958021569124295 inc=99733
>  boot=1280846434 epoch=1970 intr=0
>  rate=838095345 scale=-15 load=119
>    MACHINE="x86pc" HOSTNAME="localhost"
>  pid=77841 parent=1 child=0 pgrp=77841 sid=1  flags=0x403210 umask=0
> base_addr=0x8048000 init_stack=0x8047f00  ruid=0 euid=0 suid=0  rgid=0 egid=0
> sgid=0  ign=0000000006800000 queue=ff00000000008000 pending=0000000000000000
>  fds=6 threads=8 timers=5 chans=38
>  thread 1
>   ip=0xb032e985 sp=0x8047dc8 stkbase=0x7fc7000 stksize=528384
>   state=SIGWAITINFO flags=80000000 last_cpu=1 timeout=00000000
>   pri=10 realpri=10 policy=OTHER
>  thread 2 SIGNALLED-SIGSEGV code=1 MAPERR refaddr=4 fltno=11
>   ip=0xb031f3d4 sp=0x7fc6e40 stkbase=0x7f73000 stksize=135168
>   state=STOPPED flags=84000000 last_cpu=1 timeout=00000000
>   pri=10 realpri=10 policy=OTHER
>  thread 3
>   ip=0xb032dd29 sp=0x7fb5f00 stkbase=0x7fa5000 stksize=69632
>   state=RECEIVE flags=84020000 last_cpu=1 timeout=00000000
>   pri=10 realpri=10 policy=OTHER
>   blocked_chid=1
>  thread 4
>   ip=0xb032dd29 sp=0x7fa4f00 stkbase=0x7f94000 stksize=69632
>   state=RECEIVE flags=84020000 last_cpu=1 timeout=00000000
>   pri=10 realpri=10 policy=OTHER
>   blocked_chid=1
>  thread 5
>   ip=0xb032dd29 sp=0x7f72f00 stkbase=0x7f62000 stksize=69632
>   state=RECEIVE flags=84020000 last_cpu=1 timeout=00000000
>   pri=10 realpri=10 policy=OTHER
>   blocked_chid=1
>  thread 8
>   ip=0xb032db45 sp=0x7f1ff70 stkbase=0x7eff000 stksize=135168
>   state=RECEIVE flags=84000000 last_cpu=1 timeout=00000000
>   pri=21 realpri=21 policy=OTHER
>   blocked_chid=24
>  thread 9
>   ip=0xb032db45 sp=0x7efef70 stkbase=0x7ede000 stksize=135168
>   state=STOPPED flags=84000000 last_cpu=1 timeout=00000000
>   pri=21 realpri=21 policy=OTHER
>  thread 10
>   ip=0xb032db45 sp=0x7eddf70 stkbase=0x7ebd000 stksize=135168
>  ...
View Full Message
Re: Periodic io-net crash  
I'm curious, did you ever solve this?  We are seen something similar...  Happens too infrequently to gleam much useful 
data.
Re: Periodic io-net crash  
Hi Jason,

yes this is still a "big" issue for us at the moment.

Can you please give more details about your problem?
- What kind of hardware do you use ?
- What network driver do you use ?
- What is the exact QNX version you are using (we use QNX630 SP3) ?
- Frequency of the problem ?

We have this problem for more then 2 years now and QNX support is not able to help us ... :-(

Thanks for your feed-back.

Yannick
Re: Periodic io-net crash  
Hi Yannick, 

I gather from QNX that you and I are the only ones seeing this, and I imagine you have the same issue we have with not 
being able to reproduce it.   Perhaps we should join forces on this!

We are also using QNX 6.3.0 SP3, no small coincidence there I suspect.

We are on a x86 platform, can't recall the exact processor at this second but can dig it up.

The network card is an Intel Pro 100M and we are using devn_speedo.so with that.  I have another thread: http://
community.qnx.com/sf/discussion/do/listPosts/projects.networking/discussion.drivers.topc19046  discussing if we need/
should apply a patch to that.  I have asked QNX if that version could cause what we are seeing and am waiting for an 
answer on that.  

Frequency of the problem is still being determined: we have >12 systems with this configuration for one customer and it 
has occurred about 8-9 times over the last few months.  Some sites (at least 1) it has happened twice and obviously some
 sites not at all (to the best of our knowledge).  There is speculation that it has happened to a different customer 
(with roughly the same configuration) once, but that is being looked into.  So when it happens it seems to be about once
 a month or so…

I’d be curious to learn the same information from you about your configuration.

Thanks!

Jason
Re: Periodic io-net crash  
> Hi Yannick, 
> 
> I gather from QNX that you and I are the only ones seeing this, and I imagine 
> you have the same issue we have with not being able to reproduce it.   Perhaps
>  we should join forces on this!
> 
> We are also using QNX 6.3.0 SP3, no small coincidence there I suspect.
> 
> We are on a x86 platform, can't recall the exact processor at this second but 
> can dig it up.
> 
> The network card is an Intel Pro 100M and we are using devn_speedo.so with 
> that.  I have another thread: http://community.qnx.com/sf/discussion/do/
> listPosts/projects.networking/discussion.drivers.topc19046  discussing if we 
> need/should apply a patch to that.  I have asked QNX if that version could 
> cause what we are seeing and am waiting for an answer on that.  
> 
> Frequency of the problem is still being determined: we have >12 systems with 
> this configuration for one customer and it has occurred about 8-9 times over 
> the last few months.  Some sites (at least 1) it has happened twice and 
> obviously some sites not at all (to the best of our knowledge).  There is 
> speculation that it has happened to a different customer (with roughly the 
> same configuration) once, but that is being looked into.  So when it happens 
> it seems to be about once a month or so…
> 
> I’d be curious to learn the same information from you about your 
> configuration.
> 
> Thanks!
> 
> Jason

Jason,

We first used ethernet cards that used the speedo driver (82801DB Ethernet Controller, vendor ID: 8086h, device ID: 
103ah).
We were convinced that it was the speedo driver that caused the freeze of the PC, so we changed all our network cards to
 use the i82544 driver (82541EI Gigabit Ethernet Controller, vendorID: 8086h, deviceID: 1078h). Nothing changed, so it's
 probably not caused by speedo.

In our labs, we could reproduce a freeze by doing a brutal 'slay -9 io-net'. Does this works for you?

The patch you mentioned (patch ID 685) add a TX_FLUSH command for the speedo driver. 
After reading that, we recall that we use the TCP_NODELAY flag with the setsockopt() function in some of our projects, 
and using it or not seems to correlate with the crash of io-net. Do you use this flag too?

The support told us that it was possible to use io-pkt on QNX 6.3.0 SP3 instead of io-net. It could be a solution, but 
we have the intuition that it's a difficult thing to do (don't even know where to get the binaries).

Hope this will help...

Re: Periodic io-net crash  
Hi Jason/Yannick,

We've had this similar issue for almost a year now,
however our deployments were on QNX 6.3.2

We were advised to move on to QNX 6.4.1 and then 6.5.0 to see if it eliminated the issue.
Fortunately/unfortunately we have not been seeing the io-net issue,
but are now fighting different kind of fires as it seems to be causing a different kind of network behaviour where we 
lose network connectivity but at least the system is still responsive.

The worst thing about this is trying to reproduce it... :(

Anyway, you are not alone...

Cheers,

Jef
Re: Periodic io-net crash  
Hi Yannick,

BERGMANN Yannick wrote:
> Hi Jason,
>
> yes this is still a "big" issue for us at the moment.
>
> Can you please give more details about your problem?
> - What kind of hardware do you use ?
> - What network driver do you use ?
> - What is the exact QNX version you are using (we use QNX630 SP3) ?
> - Frequency of the problem ?
>
> We have this problem for more then 2 years now and QNX support is not able to help us ... :-(

AFAIK from the German QSS branch office... there exist a patch for 
solving that problem.
But .... you have to buy that patch!

--Armin



> Thanks for your feed-back.
>
> Yannick
>
>
>
>
> _______________________________________________
>
> Networking Drivers
> http://community.qnx.com/sf/go/post81518
>
>
Re: Periodic io-net crash  
> 
> 
> AFAIK from the German QSS branch office... there exist a patch for 
> solving that problem.
> But .... you have to buy that patch!
> 

Armin,

Can you tell me more about this patch???

Re: Periodic io-net crash  
BERGMANN Yannick wrote:
>>
>> AFAIK from the German QSS branch office... there exist a patch for
>> solving that problem.
>> But .... you have to buy that patch!
>>
> Armin,
>
> Can you tell me more about this patch???

    No ... our customer didn't buy that patch. He switched to QNX 6.4.1 ...

--Armin

  PS: I got the same problem on a industrial fair after some hours of 
demonstrations ...


>
>
>
>
> _______________________________________________
>
> Networking Drivers
> http://community.qnx.com/sf/go/post81704
>
>
Re: Periodic io-net crash  
We are facing the same issue with QNX 6.3.2 and devn-speedo driver for Intel 82551T Ethernet Controller.
Sometimes it is causing system to freeze and sometimes, only Network becomes unresponsive while CPU is still running 
other processes.

Could you solve this?