Project Home
Project Home
Documents
Documents
Wiki
Wiki
Discussion Forums
Discussions
Project Information
Project Info
BroadcastCommunity.qnx.com will be offline from May 31 6:00pm until June 2 12:00AM for upcoming system upgrades. For more information please go to https://community.qnx.com/sf/discussion/do/listPosts/projects.bazaar/discussion.bazaar.topc28418
Forum Topic - Kernel dump: S/C/F=5/4/3 (kerext_process@376) [QNX 6.5.0/x86]: (9 Items)
   
Kernel dump: S/C/F=5/4/3 (kerext_process@376) [QNX 6.5.0/x86]  
hello there

I'm running QNX 6.5.0/x86 and I'm getting often a kernel crash which dump is always similar, with:

- S/C/F=5/4/3
- instruction[f0058068] (kerext_process@376)

Since my only debug device is the screen, I attach here 3 pictures of the kernel dumps.

I just finished looking at the "reading a kernel dump" doc page, so I found out the S/C/F codes meaning is: SIGTRAP + 
TRAP_CRASH + FLTBPT

I took a quick look into an old qnx 6.4.0 kernel sources checkout, and I found the trunk/services/system/ker/
kerext_process.c file. Around the line number 376, I see a lot of consistency checks while destroying a process:

....
362     if(prp->limits  &&  prp->limits->links == ~0U) crash();
363     if(prp->pid)                crash();
364     if(prp->cred)               crash();
365     if(prp->alarm)              crash();
366     if(pril_first(&prp->sig_pending))       crash();
367     if(prp->sig_table)          crash();
368     if(prp->nfds)               crash();
369     if(prp->chancons.vector)    crash();
370     if(prp->fdcons.vector)      crash();
371     if(prp->threads.vector)     crash();
372     if(prp->timers.vector)      crash();
373     if(prp->memory)             crash();
374     if(prp->join_queue)         crash();
375 //  if(prp->session)            crash();
376     if(prp->debugger)           crash();
377     if(prp->lock)               crash();
378     if(prp->num_active_threads) crash();
379     if(prp->vfork_info)         crash();
380 // FIX ME - this is not NULL now ... why?   if(prp->rsrc_list)          crash();
381     if(prp->conf_table)         crash();
....

Of course the sources can be changed a lot since 6.4.0, but I guess I'm hitting some kind of kernel consistency 
assertion.

Here follows a brief explanation of my system architecture and the actions that often cause the kernel dump.

There are several qnx 6.5.0/x86 hosts (let's say 32, but the dump happens even if they are 6), all running io-pkt-v4, 
divided symmetrically into two groups.

Each host mounts two custom interfaces, we wrote the device io-pkt driver for both:
- the "mc0" interface connects hosts laying inthe same group (like 2 ethernet segments, each private to its group of 
hosts)
- the "cl0" interface connects each host to the remote group ones: broadcasts packets produced by one host are not 
forwarded to other hosts on the same group, so the qnet discovers only "remote" hosts via cl0 interface

qnet is bound to both mc0 and cl0 interfaces, on every host.

The kernel dump happens when, from the first host of the first group (acting as "master" host), a script spawns in 
background ("on -f <host> <script> &") a ksh script on each host in the system, which collects lot of informations about
 the host itself: many "pidin" command with almost all available options, many io-pkt query utilities (ifconfig, netstat
, nicinfo, etc.), and some custom utilities for general system monitor.
The controlling script then stops into the "wait" command until all the spawned scripts terminate.

The kernel dump happens randomly on the hosts (hosts from both groups).
We never experienced this kernel dump when the system has only one hosts group (no cl0 interface, qnet bound to mc0 
only).

Since both software drivers are custom, of course the root cause can be located in our custom code: can anyone give me a
 hint about which kind of driver error can lead to a similar kernel dump?

One last info: it seems that the presence of "pidin rc" command into the script executed in parallel on each host 
dramatically increases the chance to get a kernel dump.
I'm running a long-term test without "pidin rc" to confirm this fact.
Attachment: Image IMG398.jpg 387.67 KB Image photo_node1_blocked.JPG 817.37 KB Image IMG397.jpg 387.99 KB
Re: Kernel dump: S/C/F=5/4/3 (kerext_process@376) [QNX 6.5.0/x86]  
I confirm that by removing "pidin rc" command from the parallel-spawned script I'm not able to reproduce the kernel dump
 anymore.

Not in approx 1000 iterations of the parallel scripts, at least ;)

Anyway, some infos about the problem I got would be very appreciated.

Davide
Re: Kernel dump: S/C/F=5/4/3 (kerext_process@376) [QNX 6.5.0/x86]  
Any idea about such a kernel dump causes?

thanks
Davide
Re: Kernel dump: S/C/F=5/4/3 (kerext_process@376) [QNX 6.5.0/x86]  
I am stuck with a similar issue on QNX7 on Beagleboard-x15

Shutdown[0,0] S/C/F=5/4/3 C/D=fe0384c0/fe0d201c state(c01)= now lock 1
instruction[fe08d494] (entry.S@452):

I have an application that uses does use mailbox to communicate with the co processors. The kernel gets a panic into 
once the application starts receiving mailbox interrupts. 

Abilash 
Re: Kernel dump: S/C/F=5/4/3 (kerext_process@376) [QNX 6.5.0/x86]  
Are you making any system calls from an ISR?

On Tue, 2018-06-05 at 12:48 -0400, Abilash J Ram wrote:
> I am stuck with a similar issue on QNX7 on Beagleboard-x15
> 
> Shutdown[0,0] S/C/F=5/4/3 C/D=fe0384c0/fe0d201c state(c01)= now lock
> 1
> instruction[fe08d494] (entry.S@452):
> 
> I have an application that uses does use mailbox to communicate with
> the co processors. The kernel gets a panic into once the application
> starts receiving mailbox interrupts. 
> 
> Abilash 
> 
> 
> 
> _______________________________________________
> 
> OSTech
> http://community.qnx.com/sf/go/post118863
> To cancel your subscription to this discussion, please e-mail ostech-
> core_os-unsubscribe@community.qnx.com
Re: Kernel dump: S/C/F=5/4/3 (kerext_process@376) [QNX 6.5.0/x86]  
The following steps are in my ISR

Clear the IRQMapped in the cross bar
place a new address in the mailbox
clear mailbox interrupt
return a sigevent

Re: Kernel dump: S/C/F=5/4/3 (kerext_process@376) [QNX 6.5.0/x86]  
No, not doing any system calls from ISR
Re: Kernel dump: S/C/F=5/4/3 (kerext_process@376) [QNX 6.5.0/x86]  
The crash you see is due to reentering SVC mode from SVC mode. Can you
post your ISR code?

Also, the subject of this topic is really misleading.

--Elad

On Tue, 2018-06-05 at 13:14 -0400, Abilash J Ram wrote:
> No, not doing any system calls from ISR
> 
> 
> 
> _______________________________________________
> 
> OSTech
> http://community.qnx.com/sf/go/post118866
> To cancel your subscription to this discussion, please e-mail ostech-
> core_os-unsubscribe@community.qnx.com
Re: Kernel dump: S/C/F=5/4/3 (kerext_process@376) [QNX 6.5.0/x86]  
Hey,

Thanks for the input. The issue is I was not clearing the Interrupt as expected in the ISR. I am reading the ARM 
documentation (AM5728) to update the ISR, with proper clearing of the interrupts. 

Abilash