foundry27 : Post

Forum Topic - Kernel dump: S/C/F=5/4/3 (kerext_process@376) [QNX 6.5.0/x86]: (9 Items)

View: as

Davide Ancri

04/29/2015 5:03 AM

post113752

Kernel dump: S/C/F=5/4/3 (kerext_process@376) [QNX 6.5.0/x86]

hello there

I'm running QNX 6.5.0/x86 and I'm getting often a kernel crash which dump is always similar, with:

- S/C/F=5/4/3
- instruction[f0058068] (kerext_process@376)

Since my only debug device is the screen, I attach here 3 pictures of the kernel dumps.

I just finished looking at the "reading a kernel dump" doc page, so I found out the S/C/F codes meaning is: SIGTRAP + 
TRAP_CRASH + FLTBPT

I took a quick look into an old qnx 6.4.0 kernel sources checkout, and I found the trunk/services/system/ker/
kerext_process.c file. Around the line number 376, I see a lot of consistency checks while destroying a process:

....
362     if(prp->limits  &&  prp->limits->links == ~0U) crash();
363     if(prp->pid)                crash();
364     if(prp->cred)               crash();
365     if(prp->alarm)              crash();
366     if(pril_first(&prp->sig_pending))       crash();
367     if(prp->sig_table)          crash();
368     if(prp->nfds)               crash();
369     if(prp->chancons.vector)    crash();
370     if(prp->fdcons.vector)      crash();
371     if(prp->threads.vector)     crash();
372     if(prp->timers.vector)      crash();
373     if(prp->memory)             crash();
374     if(prp->join_queue)         crash();
375 //  if(prp->session)            crash();
376     if(prp->debugger)           crash();
377     if(prp->lock)               crash();
378     if(prp->num_active_threads) crash();
379     if(prp->vfork_info)         crash();
380 // FIX ME - this is not NULL now ... why?   if(prp->rsrc_list)          crash();
381     if(prp->conf_table)         crash();
....

Of course the sources can be changed a lot since 6.4.0, but I guess I'm hitting some kind of kernel consistency 
assertion.

Here follows a brief explanation of my system architecture and the actions that often cause the kernel dump.

There are several qnx 6.5.0/x86 hosts (let's say 32, but the dump happens even if they are 6), all running io-pkt-v4, 
divided symmetrically into two groups.

Each host mounts two custom interfaces, we wrote the device io-pkt driver for both:
- the "mc0" interface connects hosts laying inthe same group (like 2 ethernet segments, each private to its group of 
hosts)
- the "cl0" interface connects each host to the remote group ones: broadcasts packets produced by one host are not 
forwarded to other hosts on the same group, so the qnet discovers only "remote" hosts via cl0 interface

qnet is bound to both mc0 and cl0 interfaces, on every host.

The kernel dump happens when, from the first host of the first group (acting as "master" host), a script spawns in 
background ("on -f <host> <script> &") a ksh script on each host in the system, which collects lot of informations about
 the host itself: many "pidin" command with almost all available options, many io-pkt query utilities (ifconfig, netstat
, nicinfo, etc.), and some custom utilities for general system monitor.
The controlling script then stops into the "wait" command until all the spawned scripts terminate.

The kernel dump happens randomly on the hosts (hosts from both groups).
We never experienced this kernel dump when the system has only one hosts group (no cl0 interface, qnet bound to mc0 
only).

Since both software drivers are custom, of course the root cause can be located in our custom code: can anyone give me a
 hint about which kind of driver error can lead to a similar kernel dump?

One last info: it seems that the presence of "pidin rc" command into the script executed in parallel on each host 
dramatically increases the chance to get a kernel dump.
I'm running a long-term test without "pidin rc" to confirm this fact.

Attachment:

IMG398.jpg 387.67 KB

photo_node1_blocked.JPG 817.37 KB

IMG397.jpg 387.99 KB

Davide Ancri

Re: Kernel dump: S/C/F=5/4/3 (kerext_process@376) [QNX 6.5.0/x86]

Davide Ancri

04/29/2015 9:58 AM

post113754

Re: Kernel dump: S/C/F=5/4/3 (kerext_process@376) [QNX 6.5.0/x86]

I confirm that by removing "pidin rc" command from the parallel-spawned script I'm not able to reproduce the kernel dump
 anymore.

Not in approx 1000 iterations of the parallel scripts, at least ;)

Anyway, some infos about the problem I got would be very appreciated.

Davide

Davide Ancri

Re: Kernel dump: S/C/F=5/4/3 (kerext_process@376) [QNX 6.5.0/x86]

Davide Ancri

05/07/2015 11:48 AM

post113786

Re: Kernel dump: S/C/F=5/4/3 (kerext_process@376) [QNX 6.5.0/x86]

Any idea about such a kernel dump causes?

thanks
Davide

Abilash J Ram(deleted)

Re: Kernel dump: S/C/F=5/4/3 (kerext_process@376) [QNX 6.5.0/x86]

Abilash J Ram(deleted)

06/05/2018 12:48 PM

post118863

Re: Kernel dump: S/C/F=5/4/3 (kerext_process@376) [QNX 6.5.0/x86]

I am stuck with a similar issue on QNX7 on Beagleboard-x15

Shutdown[0,0] S/C/F=5/4/3 C/D=fe0384c0/fe0d201c state(c01)= now lock 1
instruction[fe08d494] (entry.S@452):

I have an application that uses does use mailbox to communicate with the co processors. The kernel gets a panic into 
once the application starts receiving mailbox interrupts. 

Abilash

Elad Lahav

06/05/2018 1:08 PM

post118864

Re: Kernel dump: S/C/F=5/4/3 (kerext_process@376) [QNX 6.5.0/x86]

Are you making any system calls from an ISR?

On Tue, 2018-06-05 at 12:48 -0400, Abilash J Ram wrote:
> I am stuck with a similar issue on QNX7 on Beagleboard-x15
> 
> Shutdown[0,0] S/C/F=5/4/3 C/D=fe0384c0/fe0d201c state(c01)= now lock
> 1
> instruction[fe08d494] (entry.S@452):
> 
> I have an application that uses does use mailbox to communicate with
> the co processors. The kernel gets a panic into once the application
> starts receiving mailbox interrupts. 
> 
> Abilash 
> 
> 
> 
> _______________________________________________
> 
> OSTech
> http://community.qnx.com/sf/go/post118863
> To cancel your subscription to this discussion, please e-mail ostech-
> core_os-unsubscribe@community.qnx.com

Abilash J Ram(deleted)

06/05/2018 1:12 PM

post118865

Re: Kernel dump: S/C/F=5/4/3 (kerext_process@376) [QNX 6.5.0/x86]

The following steps are in my ISR

Clear the IRQMapped in the cross bar
place a new address in the mailbox
clear mailbox interrupt
return a sigevent

Abilash J Ram(deleted)

06/05/2018 1:14 PM

post118866

Re: Kernel dump: S/C/F=5/4/3 (kerext_process@376) [QNX 6.5.0/x86]

No, not doing any system calls from ISR

Elad Lahav

06/05/2018 1:21 PM

post118867

Re: Kernel dump: S/C/F=5/4/3 (kerext_process@376) [QNX 6.5.0/x86]

The crash you see is due to reentering SVC mode from SVC mode. Can you
post your ISR code?

Also, the subject of this topic is really misleading.

--Elad

On Tue, 2018-06-05 at 13:14 -0400, Abilash J Ram wrote:
> No, not doing any system calls from ISR
> 
> 
> 
> _______________________________________________
> 
> OSTech
> http://community.qnx.com/sf/go/post118866
> To cancel your subscription to this discussion, please e-mail ostech-
> core_os-unsubscribe@community.qnx.com

Abilash J Ram(deleted)

06/12/2018 5:15 PM

post118872

Re: Kernel dump: S/C/F=5/4/3 (kerext_process@376) [QNX 6.5.0/x86]

Hey,

Thanks for the input. The issue is I was not clearing the Interrupt as expected in the ISR. I am reading the ARM 
documentation (AM5728) to update the ISR, with proper clearing of the interrupts. 

Abilash

Return

The text you entered is not a valid object ID
More Information
Object IDs begin with an object prefix and end with a number. For example, if you enter
artf2345
the application will jump directly to an artifact with the ID artf2345. Some valid object prefixes are:
artf	for an artifact
doc	for a document
page	for a project page
topc	for a discussion topic
wiki	for a wiki page