|
Robert D'Attilio
|
Looking for docs on decoding kernel panic dump
|
|
Robert D'Attilio
04/30/2009 4:16 PM
post28478
|
Looking for docs on decoding kernel panic dump
Hi:
I have a panic dump from procnto and would like to decode some of the information. Is there a howto or doc that
describes some of the fields?
Also are the memory addresses being referenced physical or virtual memory addresses?
Thanks
robert
BTW here is the dump
Shutdown[0,0] S/C/F=11/1/11 C/D=0004b954/000a474c state(d0)= now lock exit
QNX Version 6.3.2 Release 2006/03/16-14:18:11EST
[0]PID-TID=1-4? P/T FL=00019001/04020000 "proc/boot/procnto-400"
[0]ASPACE PID=278562 PF=00000000 "proc/boot/ps"
ppcbe context[03ff5d3c]:
0000: 00000040 03fbdcd0 000a9ba8 03fe57dc 7c005a14 03fe57dc 00000000 00000002
0020: 00000000 00000040 0006d000 00000004 24000000 000ab610 00000000 00000000
0040: 00000000 00000000 00000000 00000000 00000000 00000000 03fffc60 00000000
0060: 03fffa98 00000000 00000000 fe36d000 03cc7bd8 03fe57d4 03fe57dc 03e9b760
0080: 00000000 00066728 00029030 0009840c 44000000 00000000 00000000 00000000
00a0: 00000000
instruction[0009840c]:
89 24 00 00 38 84 00 01 7d 20 07 74 2c 00 00 00 99 23 00 00 39 63 00 01 4d 82
stack[03fbdcd0]:
0000: 03fbdda0 00066700 00000000 fe36d000 00000000 00008000 01080732 00000000
0020: 00000000 00000000 00000000 00000000 00000000 00000000 03fffc60 03e9b760
0040: 03e9b760 03cff090 00000002 03cc7bd8 03fbdd40 0005eeac 00000000 fe456000
0060: 039c8000 00700000 00004000 03fe5ce4 03fbdd50 0006698c 03fffc60 03e9b260
|
|
|
|
|
|
Ryan Mansfield
|
Re: Looking for docs on decoding kernel panic dump
|
|
Ryan Mansfield
04/30/2009 4:17 PM
post28479
|
Re: Looking for docs on decoding kernel panic dump
Robert D'Attilio wrote:
> Hi:
>
> I have a panic dump from procnto and would like to decode some of the information. Is there a howto or doc that
describes some of the fields?
http://www.qnx.com/developers/docs/6.4.0/neutrino/technotes/proc_dump.html
Regards,
Ryan Mansfield
|
|
|
|
|
|
Joel Pilon
|
Re: Looking for docs on decoding kernel panic dump
|
|
Joel Pilon
04/30/2009 4:19 PM
post28480
|
Re: Looking for docs on decoding kernel panic dump
|
|
|
|
|
|
Robert D'Attilio
|
RE: Looking for docs on decoding kernel panic dump
|
|
Robert D'Attilio
04/30/2009 4:26 PM
post28481
|
RE: Looking for docs on decoding kernel panic dump
Thanks guys.
Any idea if the memory addresses being referred to are physical
addresses or virtual?
-----Original Message-----
From: Joel Pilon [mailto:community-noreply@qnx.com]
Sent: Thursday, April 30, 2009 4:19 PM
To: ostech-core_os
Subject: Re: Looking for docs on decoding kernel panic dump
This is a good place to start:
http://www.qnx.com/developers/docs/6.4.0/neutrino/technotes/proc_dump.ht
ml
_______________________________________________
OSTech
http://community.qnx.com/sf/go/post28480
|
|
|
|
|
|
Colin Burgess
|
Re: Looking for docs on decoding kernel panic dump
|
|
Colin Burgess
04/30/2009 4:35 PM
post28482
|
Re: Looking for docs on decoding kernel panic dump
They addresses are virtual, but the ppc kernel has 1-1 mappings setup,
so they mirror the physical.
When you get a dump, add a [+keeplinked] attribute to your procnto-400 - mkifs will then
leave a procnto-400.sym binary, which you can the load into ntoppc-gdb and examine the memory
addresses.
Colin
Robert D'Attilio wrote:
> Thanks guys.
>
> Any idea if the memory addresses being referred to are physical
> addresses or virtual?
>
> -----Original Message-----
> From: Joel Pilon [mailto:community-noreply@qnx.com]
> Sent: Thursday, April 30, 2009 4:19 PM
> To: ostech-core_os
> Subject: Re: Looking for docs on decoding kernel panic dump
>
> This is a good place to start:
>
> http://www.qnx.com/developers/docs/6.4.0/neutrino/technotes/proc_dump.ht
> ml
>
> _______________________________________________
> OSTech
> http://community.qnx.com/sf/go/post28480
>
>
> _______________________________________________
> OSTech
> http://community.qnx.com/sf/go/post28481
>
--
cburgess@qnx.com
|
|
|
|
|
|
Douglas Bailey
|
RE: Looking for docs on decoding kernel panic dump
|
|
Douglas Bailey
04/30/2009 4:35 PM
post28483
|
RE: Looking for docs on decoding kernel panic dump
They are virtual addresses.
Addresses in the kernel space will match symbols in your procnto.sym
file (generated if you [+keeplinked] in your build file) so you can look
around with gdb. Looks like there's lots of those in the information
you've got -- on PPC, kernel addresses are in the first 256M.
Addresses outside the kernel space are more difficult -- you need to
know what address space they belong to and use gdb on that program.
Probably not an issue here.
Addresses in kernel interrupt code are most difficult, as you don't have
any symbolic information to figure them out...
In this case, it looks like a procnto thread was running, servicing a
request made by proc/boot/ps (from the PID and ASPACE PID lines). The
S/C/F codes indicate a segfault. The state (d0 = now lock exit) are an
artifact of a fault in a procnto thread.
Aside from that I can't contribute much without digging into the .sym
file.
> -----Original Message-----
> From: Robert D'Attilio [mailto:community-noreply@qnx.com]
> Sent: Thursday, April 30, 2009 4:27 PM
> To: ostech-core_os
> Subject: RE: Looking for docs on decoding kernel panic dump
>
> Thanks guys.
>
> Any idea if the memory addresses being referred to are
> physical addresses or virtual?
>
> -----Original Message-----
> From: Joel Pilon [mailto:community-noreply@qnx.com]
> Sent: Thursday, April 30, 2009 4:19 PM
> To: ostech-core_os
> Subject: Re: Looking for docs on decoding kernel panic dump
>
> This is a good place to start:
>
> http://www.qnx.com/developers/docs/6.4.0/neutrino/technotes/pr
> oc_dump.ht
> ml
>
> _______________________________________________
> OSTech
> http://community.qnx.com/sf/go/post28480
>
>
> _______________________________________________
> OSTech
> http://community.qnx.com/sf/go/post28481
>
>
|
|
|
|
|
|
Colin Burgess
|
Re: Looking for docs on decoding kernel panic dump
|
|
Colin Burgess
04/30/2009 5:02 PM
post28486
|
Re: Looking for docs on decoding kernel panic dump
BTW - if you have a procnto-400, but not procnto-400.sym, all is not lost...
The crash dump tells you the relocated address of main
Shutdown[0,0] S/C/F=11/1/11 C/D=0004b954/000a474c state(d0)= now lock exit
^ ^
&main &actives
and the original binary gives the unrelocated address...
(632) cburgess@titirangi100:~$ ntoppc-nm $QNX_TARGET/ppcbe/boot/sys/procnto-400 | grep main
00038b24 T _main
00045afc T kernel_main
00005954 T main
00001064 G main_attrp
00000aa8 g mapped_remaining.88
000373d4 T timer_remaining
so...
0x4b954 - 0x5954 = 0x46000
and finally
ntoppc-ld -T $QNX_TARGET/ppcbe/lib/nto.link -Ttext 0x46000 -o procnto-400.sym $QNX_TARGET/ppcbe/boot/sys/procnto-400
ntoppc-gdb procnto-400.sym
GNU gdb 6.7 qnx-nto update 6 (rev. 109)
Copyright (C) 2007 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "--host=i686-pc-linux-gnu --target=powerpc-unknown-nto-qnx6.3.2"...
(gdb) x/i 0x9840c
0x9840c <strcpy>: lbz r9,0(r4)
(gdb) x/w 0x9840c
0x9840c <strcpy>: 0x89240000
(gdb)
and we can confirm the instruction stream is the same..
instruction[0009840c]:
89 24 00 00 38 84 00 01 7d 20 07 74 2c 00 00 00 99 23 00 00 39 63 00 01 4d 82
Cheers!
Colin
Douglas Bailey wrote:
> They are virtual addresses.
>
> Addresses in the kernel space will match symbols in your procnto.sym
> file (generated if you [+keeplinked] in your build file) so you can look
> around with gdb. Looks like there's lots of those in the information
> you've got -- on PPC, kernel addresses are in the first 256M.
>
> Addresses outside the kernel space are more difficult -- you need to
> know what address space they belong to and use gdb on that program.
> Probably not an issue here.
>
> Addresses in kernel interrupt code are most difficult, as you don't have
> any symbolic information to figure them out...
>
> In this case, it looks like a procnto thread was running, servicing a
> request made by proc/boot/ps (from the PID and ASPACE PID lines). The
> S/C/F codes indicate a segfault. The state (d0 = now lock exit) are an
> artifact of a fault in a procnto thread.
>
> Aside from that I can't contribute much without digging into the .sym
> file.
>
>> -----Original Message-----
>> From: Robert D'Attilio [mailto:community-noreply@qnx.com]
>> Sent: Thursday, April 30, 2009 4:27 PM
>> To: ostech-core_os
>> Subject: RE: Looking for docs on decoding kernel panic dump
>>
>> Thanks guys.
>>
>> Any idea if the memory addresses being referred to are
>> physical addresses or virtual?
>>
>> -----Original Message-----
>> From: Joel Pilon [mailto:community-noreply@qnx.com]
>> Sent: Thursday, April 30, 2009 4:19 PM
>> To: ostech-core_os
>> Subject: Re: Looking for docs on decoding kernel panic dump
>>
>> This is a good place to start:
>>
>> http://www.qnx.com/developers/docs/6.4.0/neutrino/technotes/pr
>> oc_dump.ht
>> ml
>>
>> _______________________________________________
>> OSTech
>> http://community.qnx.com/sf/go/post28480
>>
>>
>> _______________________________________________
>> OSTech
>> http://community.qnx.com/sf/go/post28481
>>
>>
>
> _______________________________________________
> OSTech
> http://community.qnx.com/sf/go/post28483
>
--
cburgess@qnx.com
|
|
|
|
|
|
Colin Burgess
|
Re: Looking for docs on decoding kernel panic dump
|
|
Colin Burgess
04/30/2009 4:55 PM
post28485
|
Re: Looking for docs on decoding kernel panic dump
It appears to be crashing trying to strcpy a process' debug_name variable.
If you look at the instruction at 0x9840c, it's in strcpy, and it's doing
lbz r9,0(r4)
now, r4 is 0x7c005a14, is just plain wrong...
Now the 64million dollar question - can you reproduce this? :-)
Robert D'Attilio wrote:
> Hi:
>
> I have a panic dump from procnto and would like to decode some of the information. Is there a howto or doc that
describes some of the fields?
>
> Also are the memory addresses being referenced physical or virtual memory addresses?
>
> Thanks
> robert
>
> BTW here is the dump
>
> Shutdown[0,0] S/C/F=11/1/11 C/D=0004b954/000a474c state(d0)= now lock exit
> QNX Version 6.3.2 Release 2006/03/16-14:18:11EST
> [0]PID-TID=1-4? P/T FL=00019001/04020000 "proc/boot/procnto-400"
> [0]ASPACE PID=278562 PF=00000000 "proc/boot/ps"
> ppcbe context[03ff5d3c]:
> 0000: 00000040 03fbdcd0 000a9ba8 03fe57dc 7c005a14 03fe57dc 00000000 00000002
> 0020: 00000000 00000040 0006d000 00000004 24000000 000ab610 00000000 00000000
> 0040: 00000000 00000000 00000000 00000000 00000000 00000000 03fffc60 00000000
> 0060: 03fffa98 00000000 00000000 fe36d000 03cc7bd8 03fe57d4 03fe57dc 03e9b760
> 0080: 00000000 00066728 00029030 0009840c 44000000 00000000 00000000 00000000
> 00a0: 00000000
> instruction[0009840c]:
> 89 24 00 00 38 84 00 01 7d 20 07 74 2c 00 00 00 99 23 00 00 39 63 00 01 4d 82
> stack[03fbdcd0]:
> 0000: 03fbdda0 00066700 00000000 fe36d000 00000000 00008000 01080732 00000000
> 0020: 00000000 00000000 00000000 00000000 00000000 00000000 03fffc60 03e9b760
> 0040: 03e9b760 03cff090 00000002 03cc7bd8 03fbdd40 0005eeac 00000000 fe456000
> 0060: 039c8000 00700000 00004000 03fe5ce4 03fbdd50 0006698c 03fffc60 03e9b260
>
>
>
> _______________________________________________
> OSTech
> http://community.qnx.com/sf/go/post28478
>
--
cburgess@qnx.com
|
|
|
|
|
|
Robert D'Attilio
|
RE: Looking for docs on decoding kernel panic dump
|
|
Robert D'Attilio
05/01/2009 9:47 AM
post28505
|
RE: Looking for docs on decoding kernel panic dump
Colin:
Many thanks for your analysis!
Some background: I am investigating what I think is a hardware problem
(bad RAM?) on a new product in development. This particular card will
periodically have different processes core with either a signal 4,
illegal instruction error or a signal 11, segmentation fault along with
a corrupted stack. I've also been able to capture some of these kernel
panics now that I've re-enabled the serial port and disabled the
watchdog. The kernel panics are not always in the same place either. So
far the problem is isolated to this one card out of about 12 that have
been undergoing extensive verification testing for a couple of months.
I am hoping to use the kernel dumps to point to a region of RAM that
should be probed with some sort of RAM test. So far, the basic data bus
walking ones/zeros test and address lines tests haven't turned up
anything.
A couple of questions though about your analysis as working at this
level (and with QNX) is still pretty new for me and interesting:
1) How do you know that 0x7c005a14 is an invalid address?
2) The stack dump, is that the kernel's stack or the application's?
3) In a subsequent posting, you did the following:
ntoppc-ld -T $QNX_TARGET/ppcbe/lib/nto.link -Ttext 0x46000 -o
procnto-400.sym $QNX_TARGET/ppcbe/boot/sys/procnto-400
What exactly were you trying to do here? Was this how YOU were able to
determine that the failure was in strcpy() - the above relinks the
kernel to my relocation address and then you used the .sym to reverse
engineer the instruction address?
Sorry for all the questions...but I thought it was pretty cool that you
could pull out that kind of info from what looks like nonsense :)
robert
-----Original Message-----
From: Colin Burgess [mailto:community-noreply@qnx.com]
Sent: Thursday, April 30, 2009 4:55 PM
To: ostech-core_os
Subject: Re: Looking for docs on decoding kernel panic dump
It appears to be crashing trying to strcpy a process' debug_name
variable.
If you look at the instruction at 0x9840c, it's in strcpy, and it's
doing
lbz r9,0(r4)
now, r4 is 0x7c005a14, is just plain wrong...
Now the 64million dollar question - can you reproduce this? :-)
Robert D'Attilio wrote:
> Hi:
>
> I have a panic dump from procnto and would like to decode some of the
information. Is there a howto or doc that describes some of the fields?
>
> Also are the memory addresses being referenced physical or virtual
memory addresses?
>
> Thanks
> robert
>
> BTW here is the dump
>
> Shutdown[0,0] S/C/F=11/1/11 C/D=0004b954/000a474c state(d0)= now lock
exit
> QNX Version 6.3.2 Release 2006/03/16-14:18:11EST
> [0]PID-TID=1-4? P/T FL=00019001/04020000 "proc/boot/procnto-400"
> [0]ASPACE PID=278562 PF=00000000 "proc/boot/ps"
> ppcbe context[03ff5d3c]:
> 0000: 00000040 03fbdcd0 000a9ba8 03fe57dc 7c005a14 03fe57dc 00000000
00000002
> 0020: 00000000 00000040 0006d000 00000004 24000000 000ab610 00000000
00000000
> 0040: 00000000 00000000 00000000 00000000 00000000 00000000 03fffc60
00000000
> 0060: 03fffa98 00000000 00000000 fe36d000 03cc7bd8 03fe57d4 03fe57dc
03e9b760
> 0080: 00000000 00066728 00029030 0009840c 44000000 00000000 00000000
00000000
> 00a0: 00000000
> instruction[0009840c]:
> 89 24 00 00 38 84 00 01 7d 20 07 74 2c 00 00 00 99 23 00 00 39 63 00
01 4d 82
> stack[03fbdcd0]:
> 0000: 03fbdda0 00066700 00000000 fe36d000 00000000 00008000 01080732
00000000
> 0020: 00000000 00000000 00000000 00000000 00000000 00000000 03fffc60
03e9b760
> 0040: 03e9b760 03cff090 00000002 03cc7bd8 03fbdd40 0005eeac 00000000
fe456000
> 0060: 039c8000 00700000 00004000 03fe5ce4 03fbdd50 0006698c 03fffc60
03e9b260
>
>
>
> _______________________________________________
> OSTech
> http://community.qnx.com/sf/go/post28478
>
--...
|
|
|
|
|
|
Colin Burgess
|
Re: Looking for docs on decoding kernel panic dump
|
|
Colin Burgess
05/01/2009 10:06 AM
post28510
|
Re: Looking for docs on decoding kernel panic dump
Robert D'Attilio wrote:
> Colin:
>
> Many thanks for your analysis!
>
> Some background: I am investigating what I think is a hardware problem
> (bad RAM?) on a new product in development. This particular card will
> periodically have different processes core with either a signal 4,
> illegal instruction error or a signal 11, segmentation fault along with
> a corrupted stack. I've also been able to capture some of these kernel
> panics now that I've re-enabled the serial port and disabled the
> watchdog. The kernel panics are not always in the same place either. So
> far the problem is isolated to this one card out of about 12 that have
> been undergoing extensive verification testing for a couple of months.
>
> I am hoping to use the kernel dumps to point to a region of RAM that
> should be probed with some sort of RAM test. So far, the basic data bus
> walking ones/zeros test and address lines tests haven't turned up
> anything.
Well the first thing I would check is the power. I've seen plenty of boards
where the power was just barely in the spec'd margins, and fluctuations in the
supply could lead to memory corruption. We spent a long time chasing one particular
corruption only to find out that it was as simple as that. And yes, because
it was marginal it varied board by board. It often would only fail under heavy load
too (with peripherals sucking the power I guess).
> A couple of questions though about your analysis as working at this
> level (and with QNX) is still pretty new for me and interesting:
>
> 1) How do you know that 0x7c005a14 is an invalid address?
The virtual address space is divided by kernel and user spaces. The kernel lives (on PPC)
from 0-1G, usermode addresses are higher. Hence a usermode address for a procnto variable
is not normal.
Note that it is normal for the kernel to access usermode vaddrs, but they must be verified
first, and in this case it was a kernel allocated variable.
> 2) The stack dump, is that the kernel's stack or the application's?
It's always the kernel stack.
> 3) In a subsequent posting, you did the following:
>
> ntoppc-ld -T $QNX_TARGET/ppcbe/lib/nto.link -Ttext 0x46000 -o
> procnto-400.sym $QNX_TARGET/ppcbe/boot/sys/procnto-400
>
> What exactly were you trying to do here? Was this how YOU were able to
> determine that the failure was in strcpy() - the above relinks the
> kernel to my relocation address and then you used the .sym to reverse
> engineer the instruction address?
The procnto-400 binary we ship is actually an object file. When you run mkifs (try it with -vv) it
actually runs a linker to relocate the startup and the procnto to the appropriate addresses for your
image.
The [+keeplinked] attribute tells mkifs to leave the relocated symbol file around. It's very useful for
this sort of debugging.
Once I had the relocated symbols I could simply load the .sym file into gdb and check the instruction where
the kernel faulted.
The register context (described in usr/include/ppc/context.h) gave me the registers.
> Sorry for all the questions...but I thought it was pretty cool that you
> could pull out that kind of info from what looks like nonsense :)
It does appear to be a bit of black magic from the outside, doesn't it? And I assure you, it IS. Now, I've
got to go drain the blood of a mouse.. :-)
Colin
> robert
>
>
> -----Original Message-----
> From: Colin Burgess [mailto:community-noreply@qnx.com]
> Sent: Thursday, April 30, 2009 4:55 PM
> To: ostech-core_os
> Subject: Re: Looking for docs on decoding kernel panic dump
>
> It appears to be crashing trying to strcpy a process' debug_name
> variable.
>
> If you look at the instruction at 0x9840c, it's in strcpy, and it's
> doing
>
> lbz r9,0(r4)
>...
View Full Message
|
|
|
|
|
|