Project Home
Project Home
Wiki
Wiki
Discussion Forums
Discussions
Project Information
Project Info
BroadcastCommunity.qnx.com will be offline from May 31 6:00pm until June 2 12:00AM for upcoming system upgrades. For more information please go to https://community.qnx.com/sf/discussion/do/listPosts/projects.bazaar/discussion.bazaar.topc28418
Forum Topic - Periodic io-net crash: Page 1 of 11 (11 Items)
   
Periodic io-net crash  
Periodically (once every 2-3 month), on many sites, we have problems with io-net under QNX 6.3.0 SP3 (x86). When this 
happens, the system completely freeze, so we can found core dumps but they seem corrupted (coreinfo give results, but in
 gdb, the stack trace is very long). We also tried to generate .kev files but the system have no time to write them on 
disk.

After some research in our labs, we found that killing io-net with SIGKILL (slay -f -9 io-net) caused the system to 
freeze, this is certainly due to a bypass of the cleaning process. If we look at the coreinfo output, we see that one 
thread of io-net receive SIGSEGV, so now we would like to know how to prevent this fault.

We developed using the High Availability Manager a method for restarting io-net when it dies, and surprisingly, it 
prevent the system to freeze when sending a SIGKILL to io-net. On site, however, it didn't worked: we waited 3 month for
 the crash to happen and when it happened, the system was freezed.

At this point, we really don't know what to do. The problem is not reproducible (we never experienced it on our labs), 
we don't have a lot of informations, except the io-net corrupted core dump that is placed in /var/dumps/.
We really don't know what to do to avoid the freeze of the system.

Any suggestions?

Here is the coreinfo output, and some infos about io-net's configuration:

>coreinfo io-net.core
io-net.core:
 processor=X86 num_cpus=1
  cpu 1 cpu=686 name=Intel 686 F6M13S8 speed=1798
   flags=0xc0007fff FPU MMU CPUID RDTSC INVLPG WP BSWAP MMX CMOV PSE PGE MTRR SEP SIMD FXSR  cyc/sec=1800291400 tod_adj=
1280846434000000000 nsec=7958021569124295 inc=99733
 boot=1280846434 epoch=1970 intr=0
 rate=838095345 scale=-15 load=119
   MACHINE="x86pc" HOSTNAME="localhost"
 pid=77841 parent=1 child=0 pgrp=77841 sid=1  flags=0x403210 umask=0 base_addr=0x8048000 init_stack=0x8047f00  ruid=0 
euid=0 suid=0  rgid=0 egid=0 sgid=0  ign=0000000006800000 queue=ff00000000008000 pending=0000000000000000
 fds=6 threads=8 timers=5 chans=38
 thread 1
  ip=0xb032e985 sp=0x8047dc8 stkbase=0x7fc7000 stksize=528384
  state=SIGWAITINFO flags=80000000 last_cpu=1 timeout=00000000
  pri=10 realpri=10 policy=OTHER
 thread 2 SIGNALLED-SIGSEGV code=1 MAPERR refaddr=4 fltno=11
  ip=0xb031f3d4 sp=0x7fc6e40 stkbase=0x7f73000 stksize=135168
  state=STOPPED flags=84000000 last_cpu=1 timeout=00000000
  pri=10 realpri=10 policy=OTHER
 thread 3
  ip=0xb032dd29 sp=0x7fb5f00 stkbase=0x7fa5000 stksize=69632
  state=RECEIVE flags=84020000 last_cpu=1 timeout=00000000
  pri=10 realpri=10 policy=OTHER
  blocked_chid=1
 thread 4
  ip=0xb032dd29 sp=0x7fa4f00 stkbase=0x7f94000 stksize=69632
  state=RECEIVE flags=84020000 last_cpu=1 timeout=00000000
  pri=10 realpri=10 policy=OTHER
  blocked_chid=1
 thread 5
  ip=0xb032dd29 sp=0x7f72f00 stkbase=0x7f62000 stksize=69632
  state=RECEIVE flags=84020000 last_cpu=1 timeout=00000000
  pri=10 realpri=10 policy=OTHER
  blocked_chid=1
 thread 8
  ip=0xb032db45 sp=0x7f1ff70 stkbase=0x7eff000 stksize=135168
  state=RECEIVE flags=84000000 last_cpu=1 timeout=00000000
  pri=21 realpri=21 policy=OTHER
  blocked_chid=24
 thread 9
  ip=0xb032db45 sp=0x7efef70 stkbase=0x7ede000 stksize=135168
  state=STOPPED flags=84000000 last_cpu=1 timeout=00000000
  pri=21 realpri=21 policy=OTHER
 thread 10
  ip=0xb032db45 sp=0x7eddf70 stkbase=0x7ebd000 stksize=135168
  state=STOPPED flags=84000000 last_cpu=1 timeout=00000000
  pri=21 realpri=21 policy=OTHER


>pidin -p io-net mem irq
     pid tid name               prio STATE           code  data         stack
   77841   1 sbin/io-net         10o SIGWAITINFO       64K 5172K  8192(516K)*
   77841   2 sbin/io-net         10o RECEIVE           64K 5172K  4096(132K) 
   77841   3 sbin/io-net         10o RECEIVE           64K 5172K    24K(68K) 
   77841   4 sbin/io-net         10o RECEIVE           64K 5172K   4096(68K) 
   77841   5 sbin/io-net         10o RECEIVE  ...
View Full Message
Attachment: Text io-net.core.gz 562.82 KB