Project Home
Project Home
Documents
Documents
Wiki
Wiki
Discussion Forums
Discussions
Project Information
Project Info
Forum Topic - locating the point of a crash: (14 Items)
   
locating the point of a crash  
I have a process that crashes and leaves a dump file.  If I debug the dump file the location of the crash is <Symbol is 
not available>.

So using the information from the log file it tells me that the crash occurred at 0x080f3252 (if I'm understanding what 
that address is).

My link map file tells me that that address is beyond the end of the process.  I must be misunderstanding something.

How can I locate the point of this crash?  What am I misunderstanding?

I am windows hosted running 6.3.2 on an x86 platform.

Thanx
Tim
Re: locating the point of a crash  
Tim Gessner wrote:
> I have a process that crashes and leaves a dump file.  If I debug the dump file the location of the crash is <Symbol 
is not available>.
>
> So using the information from the log file it tells me that the crash occurred at 0x080f3252 (if I'm understanding 
what that address is).
>
> My link map file tells me that that address is beyond the end of the process.  I must be misunderstanding something.
>
> How can I locate the point of this crash?  What am I misunderstanding?
>
> I am windows hosted running 6.3.2 on an x86 platform.
>   

Chances are that the crash is in a shared library that your application 
loads.  Since this is a pretty deterministic process, you can
just run
 % pidin mem
once your process starts (assuming it runs long enough for you to do 
that) and then look at the addresses of the shared libraries
and their sizes.  With any luck you aren't loading a DLL dynamically 
that is the source of the crash.

If I had to bet, I'd bet that the crash is in libc and your application 
is passing bad data to one of the library functions and
it is dying there.

Also, you might want to run the process in the debugger if that works in 
your environment ... faster than doing all this
by hand =;-)

Hope this helps,
 Thomas
AW: locating the point of a crash  
> -----Ursprungliche Nachricht-----
> Von: Thomas Fletcher [mailto:community-noreply@qnx.com]
> Gesendet: 18 October 2008 02:21
> An: ostech-core_os
> Betreff: Re: locating the point of a crash
> 
> 
> Tim Gessner wrote:
> > I have a process that crashes and leaves a dump file.  If I 
> debug the dump file the location of the crash is <Symbol is 
> not available>.
> >
> > So using the information from the log file it tells me that 
> the crash occurred at 0x080f3252 (if I'm understanding what 
> that address is).
> >
> > My link map file tells me that that address is beyond the 
> end of the process.  I must be misunderstanding something.
> >
> > How can I locate the point of this crash?  What am I 
> misunderstanding?
> >
> > I am windows hosted running 6.3.2 on an x86 platform.
> >   
> 
> Chances are that the crash is in a shared library that your 
> application loads.

Sure? with all x86 apps I've seen, a 0x08.. address would always
refer to 'static' program text or data, but not to a shared lib 
or dll - those'll get virtual addresses in the 0xb... and above
range.

What kind of crash is this, anyway? Sounds a  bit like a stack 
frame overwrite, so you end up with a bad return address...

Try loading the core file in the debugger, and see if you can 
get the call stack backtrace; sometimes it's easier starting out 
one level above...

> Since this is a pretty deterministic process, you can just run
>  % pidin mem
> once your process starts (assuming it runs long enough for you to do 
> that) and then look at the addresses of the shared libraries
> and their sizes.  With any luck you aren't loading a DLL dynamically 
> that is the source of the crash.
> 
> If I had to bet, I'd bet that the crash is in libc and your 
> application is passing bad data to one of the library functions 
> and it is dying there.
> 
> Also, you might want to run the process in the debugger if 
> that works in your environment ... faster than doing all this
> by hand =;-)

And even better to do so with debug info included 8-).
...then again, that might make the crash go away.

If you don't find anything, post the output of 
   coreinfo <your-app>.core
and possibly attach the map file.

Thanks,
- Thomas

> Hope this helps,
>  Thomas
> 
> _______________________________________________
> OSTech
> http://community.qnx.com/sf/go/post15188
> 
> 
Re: AW: locating the point of a crash  
The libraries do appear to be in 0xb... and my app code, etc. in the range 0x0804...

There is not stack trace in the core dump for the crash.  There is the thread starting point and then the indicator <
Symbol is not available>.  That is why I was trying to locate the crash address.

Running under the debugger prevents the crash so that doesn't help me track it down.

Here is the output of coreinfo and attached is the map file.

cadred.core:
 processor=X86 num_cpus=2
  cpu 1 cpu=686 name=Intel 686 F6M15S6 speed=1506
   flags=0xc0007fff FPU MMU CPUID RDTSC INVLPG WP BSWAP MMX CMOV PSE PGE MTRR SEP SIMD FXSR
  cpu 2 cpu=686 name=Intel 686 F6M15S6 speed=1506
   flags=0xc0007fff FPU MMU CPUID RDTSC INVLPG WP BSWAP MMX CMOV PSE PGE MTRR SEP SIMD FXSR
 cyc/sec=1500348600 tod_adj=1222764829000000000 nsec=1491952564327196 inc=999847
 boot=1222764829 epoch=1970 intr=0
 rate=838095345 scale=-15 load=1193
   MACHINE="x86pc" HOSTNAME="localhost"
 pid=12967963 parent=176149 child=0 pgrp=12967963 sid=1
 flags=0x402210 umask=0 base_addr=0x8048000 init_stack=0x8047df4
 ruid=0 euid=0 suid=0  rgid=0 egid=0 sgid=0
 ign=0000000000000001 queue=ff00000000000000 pending=0000000000000000
 fds=15 threads=9 timers=5 chans=18
 thread 1
  ip=0xb0331191 sp=0x804773c stkbase=0x7fc7000 stksize=528384
  state=NANOSLEEP flags=0 last_cpu=1 timeout=0x1001000
  pri=10 realpri=10 policy=OTHER
 thread 2
  ip=0xb032f931 sp=0x7fc6e30 stkbase=0x7fa6000 stksize=135168
  state=RECEIVE flags=4000000 last_cpu=2 timeout=00000000
  pri=12 realpri=12 policy=FIFO
  blocked_chid=1
 thread 3
  ip=0xb032fae1 sp=0x7fa5e20 stkbase=0x7f85000 stksize=135168
  state=RECEIVE flags=4000000 last_cpu=1 timeout=00000000
  pri=11 realpri=11 policy=OTHER
  blocked_chid=4
 thread 4 SIGNALLED-SIGSEGV code=4 SPERR refaddr=80f3252 fltno=2
  ip=0x80f3252 sp=0x7f84ea0 stkbase=0x7f64000 stksize=135168
  state=STOPPED flags=4000000 last_cpu=1 timeout=00000000
  pri=11 realpri=11 policy=OTHER
 thread 5
  ip=0xb032f931 sp=0x7f63e30 stkbase=0x7f43000 stksize=135168
  state=STOPPED flags=4000000 last_cpu=1 timeout=00000000
  pri=255 realpri=255 policy=OTHER
 thread 6
  ip=0xb032f931 sp=0x7f42e20 stkbase=0x7f22000 stksize=135168
  state=RECEIVE flags=4000000 last_cpu=1 timeout=00000000
  pri=12 realpri=12 policy=FIFO
  blocked_chid=11
 thread 7
  ip=0xb032f931 sp=0x7f21e20 stkbase=0x7f01000 stksize=135168
  state=RECEIVE flags=4000000 last_cpu=2 timeout=00000000
  pri=12 realpri=12 policy=FIFO
  blocked_chid=14
 thread 8
  ip=0xb0330901 sp=0x7f00ef0 stkbase=0x7ee0000 stksize=135168
  state=CONDVAR flags=4000000 last_cpu=2 timeout=00000000
  pri=9 realpri=9 policy=OTHER
 thread 9
  ip=0xb0330901 sp=0x7edff00 stkbase=0x7ebf000 stksize=135168
  state=CONDVAR flags=4000000 last_cpu=2 timeout=00000000
  pri=9 realpri=9 policy=OTHER


Thanx
Tim
Attachment: Text cadred.map 1.16 MB
Re: AW: locating the point of a crash  
Which version of GDB are you using?

If Thomas' assumption is right and there is some stack overflow going on, then gdb will not be able to figure it out. 
Gdb needs a valid stack in order to "calculate" the stack trace.

AW: AW: locating the point of a crash  
Hi Tim,

good you posted the coreinfo. Looking at it, you can see that thread 
#4 was the one that faulted:

 thread 4 SIGNALLED-SIGSEGV code=4 SPERR refaddr=80f3252 fltno=2
  ip=0x80f3252 sp=0x7f84ea0 stkbase=0x7f64000 stksize=135168
  state=STOPPED flags=4000000 last_cpu=1 timeout=00000000
  pri=11 realpri=11 policy=OTHER

And you can also see that the ip (instruction pointer) at the time 
of the fault was identical to the offending address (refaddr).

That would usually happen for one of two reasons:

Most likely: stack contents were overwritten in a function and the 
"return" kicked you over edge.

Less likely, but well possible: A variable holding a function pointer 
has become stale and now points to never-never land.

The former case can easily be diagnosed, because a "return" would just 
have popped the offending return address off the stack:
Open the core file in gdb, then look at the contents of the four bytes
right below the current stack pointer (sp):
  (gdb) thread 4
  (gdb) x/1xw $esp-4

If that is equal to the core's ip, in your case, "0x80f3252", a bad
return is what you've got.  As to where it came from - it can't be 
too deep in your function call hierarchy, you seem to have used only
352 bytes (88 words) of stack so far. We might do some qualified 
guessing, though. Could you, in gdb, do
  (gdb) thread 4
  (gdb) i r
  (gdb) x/88xw $esp
...and post the output?

Thanks,
- Thomas

> -----Ursprungliche Nachricht-----
> Von: Tim Gessner [mailto:community-noreply@qnx.com]
> Gesendet: 20 October 2008 16:29
> An: ostech-core_os
> Betreff: Re: AW: locating the point of a crash
> 
> 
> The libraries do appear to be in 0xb... and my app code, etc. 
> in the range 0x0804...
> 
> There is not stack trace in the core dump for the crash.  
> There is the thread starting point and then the indicator 
> <Symbol is not available>.  That is why I was trying to 
> locate the crash address.
> 
> Running under the debugger prevents the crash so that doesn't 
> help me track it down.
> 
> Here is the output of coreinfo and attached is the map file.
> 
> cadred.core:
>  processor=X86 num_cpus=2
>   cpu 1 cpu=686 name=Intel 686 F6M15S6 speed=1506
>    flags=0xc0007fff FPU MMU CPUID RDTSC INVLPG WP BSWAP MMX 
> CMOV PSE PGE MTRR SEP SIMD FXSR
>   cpu 2 cpu=686 name=Intel 686 F6M15S6 speed=1506
>    flags=0xc0007fff FPU MMU CPUID RDTSC INVLPG WP BSWAP MMX 
> CMOV PSE PGE MTRR SEP SIMD FXSR
>  cyc/sec=1500348600 tod_adj=1222764829000000000 
> nsec=1491952564327196 inc=999847
>  boot=1222764829 epoch=1970 intr=0
>  rate=838095345 scale=-15 load=1193
>    MACHINE="x86pc" HOSTNAME="localhost"
>  pid=12967963 parent=176149 child=0 pgrp=12967963 sid=1
>  flags=0x402210 umask=0 base_addr=0x8048000 init_stack=0x8047df4
>  ruid=0 euid=0 suid=0  rgid=0 egid=0 sgid=0
>  ign=0000000000000001 queue=ff00000000000000 pending=0000000000000000
>  fds=15 threads=9 timers=5 chans=18
>  thread 1
>   ip=0xb0331191 sp=0x804773c stkbase=0x7fc7000 stksize=528384
>   state=NANOSLEEP flags=0 last_cpu=1 timeout=0x1001000
>   pri=10 realpri=10 policy=OTHER
>  thread 2
>   ip=0xb032f931 sp=0x7fc6e30 stkbase=0x7fa6000 stksize=135168
>   state=RECEIVE flags=4000000 last_cpu=2 timeout=00000000
>   pri=12 realpri=12 policy=FIFO
>   blocked_chid=1
>  thread 3
>   ip=0xb032fae1 sp=0x7fa5e20 stkbase=0x7f85000 stksize=135168
>   state=RECEIVE flags=4000000 last_cpu=1 timeout=00000000
>   pri=11 realpri=11 policy=OTHER
>   blocked_chid=4
>  thread 4 SIGNALLED-SIGSEGV code=4 SPERR refaddr=80f3252 fltno=2
>   ip=0x80f3252 sp=0x7f84ea0 stkbase=0x7f64000 stksize=135168
>   state=STOPPED flags=4000000 last_cpu=1 timeout=00000000
>   pri=11 realpri=11 policy=OTHER
>  thread 5
>   ip=0xb032f931...
View Full Message
Re: AW: AW: locating the point of a crash  
Sorry for not getting back sooner - got pulled away.  I'm not familiar with using gdb to debug a core without the IDE.  
I am windows hosted, so I assume I would use ntox86-gdb?

I have used gdb to debug outside of QNX, but never a core dump.  What command line args do I use?  Do I start it with 
the executable then load the core dump?

Thanx
Tim
Re: AW: AW: locating the point of a crash  
> Sorry for not getting back sooner - got pulled away.  I'm not familiar with 
> using gdb to debug a core without the IDE.  I am windows hosted, so I assume I
>  would use ntox86-gdb?
> 
> I have used gdb to debug outside of QNX, but never a core dump.  What command 
> line args do I use?  Do I start it with the executable then load the core dump
> ?

open cmd prompt, then type:

C:>ntox86-gdb <exefile> --core <corefile>

(--core switch is optional)

You can also start it, then specify file and then core, it would be the same:

C:>ntox86-gdb
(gdb) file <myexe>
...
(gdb) core <mycore>

Re: AW: AW: locating the point of a crash  
Thanx - that was easy!
Re: AW: AW: locating the point of a crash  
Ok, here is the output of gdb - Thanx

(gdb) thread 4
[Switching to thread 4 (process 4)]#0  0x080f3252 in ?? ()
(gdb) x/1xw $esp-4
0x7f84e9c:      0x07f84f9c
(gdb) i r
eax            0x80f3250        135213648
ecx            0x7f84e54        133713492
edx            0x7f84e54        133713492
ebx            0x0      0
esp            0x7f84ea0        0x7f84ea0
ebp            0x7f84f9c        0x7f84f9c
esi            0x0      0
edi            0x0      0
eip            0x80f3252        0x80f3252
eflags         0x11216  70166
cs             0xf3     243
ss             0xfb     251
ds             0x0      0
es             0x0      0
fs             0x0      0
gs             0x0      0
fctrl          0x0      0
fstat          0x0      0
ftag           0x0      0
fiseg          0x0      0
fioff          0x0      0
foseg          0x0      0
fooff          0x0      0
fop            0x0      0
(gdb) x/88xw $esp
0x7f84ea0:      0x080690b7      0x080f3200      0x080c378d      0x00000006
0x7f84eb0:      0x08068eb0      0x00000000      0x00000000      0x00000000
0x7f84ec0:      0x00000000      0x00000000      0x00000000      0x00000000
0x7f84ed0:      0x412757a6      0x48fdae5f      0x00057625      0x080f8dfc
0x7f84ee0:      0x080f3200      0xc1a05000      0x00004d44      0x00000000
0x7f84ef0:      0x40000007      0x00000000      0x00000000      0x00000000
0x7f84f00:      0x00000000      0x00000000      0x00000000      0x00000000
0x7f84f10:      0x00000000      0x00000000      0x00000000      0x00000000
0x7f84f20:      0x00000000      0x00000000      0x00000000      0x00000000
0x7f84f30:      0x00000000      0x00000000      0x00000000      0x00000000
0x7f84f40:      0x00000000      0x00000000      0x00000000      0x00000000
0x7f84f50:      0x00000000      0x00000000      0x00000000      0x00000000
0x7f84f60:      0x00000000      0x080f9234      0x080f8dfc      0x00000000
0x7f84f70:      0x412757a6      0x48fdae5f      0x00057625      0x00000000
0x7f84f80:      0x000493e0      0x00000000      0x0806afa0      0x08047888
0x7f84f90:      0x00000000      0x00000000      0x00000000      0x07f84fbc
0x7f84fa0:      0x0806788f      0x08047888      0x00000000      0x00000000
0x7f84fb0:      0x00000000      0x00000000      0x08047888      0x00000000
0x7f84fc0:      0xb031b8b4      0x08047888      0x07f84fcc      0x00000104
0x7f84fd0:      0x00000000      0x00e1901b      0x00000004      0x101b0004
0x7f84fe0:      0x07f65000      0x00000000      0x00000000      0x00000000
0x7f84ff0:      0x07f84f84      0x00000000      0x00000000      0x00000000
(gdb)
Re: AW: AW: locating the point of a crash  
> Ok, here is the output of gdb - Thanx
> 
> (gdb) thread 4
> [Switching to thread 4 (process 4)]#0  0x080f3252 in ?? ()
> (gdb) x/1xw $esp-4
> 0x7f84e9c:      0x07f84f9c
> (gdb) i r
> eax            0x80f3250        135213648
> ecx            0x7f84e54        133713492
> edx            0x7f84e54        133713492
> ebx            0x0      0
> esp            0x7f84ea0        0x7f84ea0
> ebp            0x7f84f9c        0x7f84f9c
> esi            0x0      0
> edi            0x0      0
> eip            0x80f3252        0x80f3252
^^^^^^^^^^^^^^^^^^^^^^^^^^^
This is what gdb sees as current frame. It looks like it jumped into nowhere.


> (gdb) x/88xw $esp
> 0x7f84ea0:      0x080690b7      0x080f3200      0x080c378d      0x00000006
^^^^^^^^^^^^^^^^^^^
If you use
(gdb) info symbol 0x080690b7
you will probably get that it's 
0x08067872                CDeviceMgr::DMCPThread(void*)

If that is correct then gdb wouldn't be able to backtrace past that (providing my guess that this is your thread 
function is correct). Gdb can not currently detect thread start frame and what you would see below DMCPThread frame 
would be question marks (which is o.k. in this case).

In any case, if this so far makes sense, you can further determine where exactly the call came from by disassembling 
your executable which can be done from gdb:

(gdb) disassemble 0x080690b7


Hope this helps.

Re: AW: AW: locating the point of a crash  
Thanx, I found it, a bad function pointer.
AW: AW: AW: locating the point of a crash  
Hi Tim,

> (gdb) x/1xw $esp-4
> 0x7f84e9c:      0x07f84f9c

Looking at this, it's obviously not a bad return (otherwise, 
the content of this address would've been 0x080f3252).

So my next guess would be "bad function pointer". As you seem 
to be doing C++, it might easily be a damaged virtual method 
address - which leads to the next-most-likely explanation:
  Did you change some class definition and then not do a 
  "make clean" over all the modules in your app?

If you didn't, that would explain things...

- Thomas


> -----Ursprungliche Nachricht-----
> Von: Tim Gessner [mailto:community-noreply@qnx.com]
> Gesendet: 21 October 2008 19:28
> An: ostech-core_os
> Betreff: Re: AW: AW: locating the point of a crash
> 
> 
> Ok, here is the output of gdb - Thanx
> 
> (gdb) thread 4
> [Switching to thread 4 (process 4)]#0  0x080f3252 in ?? ()
> (gdb) x/1xw $esp-4
> 0x7f84e9c:      0x07f84f9c
> (gdb) i r
> eax            0x80f3250        135213648
> ecx            0x7f84e54        133713492
> edx            0x7f84e54        133713492
> ebx            0x0      0
> esp            0x7f84ea0        0x7f84ea0
> ebp            0x7f84f9c        0x7f84f9c
> esi            0x0      0
> edi            0x0      0
> eip            0x80f3252        0x80f3252
> eflags         0x11216  70166
> cs             0xf3     243
> ss             0xfb     251
> ds             0x0      0
> es             0x0      0
> fs             0x0      0
> gs             0x0      0
> fctrl          0x0      0
> fstat          0x0      0
> ftag           0x0      0
> fiseg          0x0      0
> fioff          0x0      0
> foseg          0x0      0
> fooff          0x0      0
> fop            0x0      0
> (gdb) x/88xw $esp
> 0x7f84ea0:      0x080690b7      0x080f3200      0x080c378d    
>   0x00000006
> 0x7f84eb0:      0x08068eb0      0x00000000      0x00000000    
>   0x00000000
> 0x7f84ec0:      0x00000000      0x00000000      0x00000000    
>   0x00000000
> 0x7f84ed0:      0x412757a6      0x48fdae5f      0x00057625    
>   0x080f8dfc
> 0x7f84ee0:      0x080f3200      0xc1a05000      0x00004d44    
>   0x00000000
> 0x7f84ef0:      0x40000007      0x00000000      0x00000000    
>   0x00000000
> 0x7f84f00:      0x00000000      0x00000000      0x00000000    
>   0x00000000
> 0x7f84f10:      0x00000000      0x00000000      0x00000000    
>   0x00000000
> 0x7f84f20:      0x00000000      0x00000000      0x00000000    
>   0x00000000
> 0x7f84f30:      0x00000000      0x00000000      0x00000000    
>   0x00000000
> 0x7f84f40:      0x00000000      0x00000000      0x00000000    
>   0x00000000
> 0x7f84f50:      0x00000000      0x00000000      0x00000000    
>   0x00000000
> 0x7f84f60:      0x00000000      0x080f9234      0x080f8dfc    
>   0x00000000
> 0x7f84f70:      0x412757a6      0x48fdae5f      0x00057625    
>   0x00000000
> 0x7f84f80:      0x000493e0      0x00000000      0x0806afa0    
>   0x08047888
> 0x7f84f90:      0x00000000      0x00000000      0x00000000    
>   0x07f84fbc
> 0x7f84fa0:      0x0806788f      0x08047888      0x00000000    
>   0x00000000
> 0x7f84fb0:      0x00000000      0x00000000      0x08047888    
>   0x00000000
> 0x7f84fc0:      0xb031b8b4      0x08047888      0x07f84fcc    
>   0x00000104
> 0x7f84fd0:      0x00000000      0x00e1901b      0x00000004    
>   0x101b0004
> 0x7f84fe0:      0x07f65000      0x00000000      0x00000000    
>   0x00000000
> 0x7f84ff0:      0x07f84f84      0x00000000      0x00000000    
>   0x00000000
> (gdb)
> 
> _______________________________________________
> OSTech
> http://community.qnx.com/sf/go/post15354
> 
> 
Re: AW: AW: AW: locating the point of a crash  
Yes, it was a bad function pointer.  I was able to find it in my code.

Thanx for all your help!