Project Home
Project Home
Wiki
Wiki
Discussion Forums
Discussions
Project Information
Project Info
Forum Topic - segfault when using devn-asix.so with io-pkt: Page 1 of 2 (12 Items)
   
segfault when using devn-asix.so with io-pkt  
Hi,

I'm trying to get an existing (binary) driver I have, asix working with io-pkt. This is a USB ethernet device, which was
 working fine with io-net. However when using it with io-pkt, I get a segfault. I managed to re-run it with a debug 
version of io-pkt, and it appears that there is a thread-local data structure which is not getting initialised correctly
.

GDB backtrace:

$ ntox86-gdb io-pkt-v4_g --core=io-pkt.core 
GNU gdb 5.2.1qnx-nto
<snip>
Program terminated with signal 11, Segmentation fault.
<snip>
#0  realloc_bsd (addr=0x0, size=3072, ksp=0x80c2820, flags=3) at /export/davidr/core_networking/trunk/sys/kern/
kern_malloc.c:1120
1120			NW_SIGHOLD_P(wtp);
(gdb) bt
#0  realloc_bsd (addr=0x0, size=3072, ksp=0x80c2820, flags=3) at /export/davidr/core_networking/trunk/sys/kern/
kern_malloc.c:1120
#1  0x0807d992 in malloc_bsd (size=3072, ksp=0x80c2820, flags=3) at /export/davidr/core_networking/trunk/sys/kern/
kern_malloc.c:1089
#2  0x0805ca57 in dev_attach (drvr=0xb820615b "en", options=0x80a0c5c "", ca=0xb8207620, cfat_arg=0x7fc6ea0, single=
0x7f97d94, devp=0x7f97d98, print=0)
    at /export/davidr/core_networking/trunk/sys/device_qnx.c:98
#3  0xb82028c8 in shim_create_instance () from /export/davidr/stage/x86/lib/dll/devnp-shim.so
#4  0xb8203fbc in ex_reg () from /export/davidr/stage/x86/lib/dll/devnp-shim.so
#5  0xb8214709 in ax_register_device () from /export/davidr/stage/x86/lib/dll/devn-asix.so
#6  0xb8214f9a in ax_attach () from /export/davidr/stage/x86/lib/dll/devn-asix.so
#7  0xb82155fe in ax_insertion () from /export/davidr/stage/x86/lib/dll/devn-asix.so
#8  0xb8218a1c in eventloop () from /export/davidr/stage/x86/lib/dll/devn-asix.so
 
The NW_SIGHOLD_P macro causes the wtp argument to be deferenced, and this is null at this point:

(gdb) info locals
addr = (void *) 0x0
va = (void *) 0x0
wtp = (struct nw_work_thread *) 0x0
size_rounded = 3072
mcp = (struct malcheck *) 0x0

(gdb) info thread
  4 process 4  0xb8214438 in ax_event_handler () from /export/davidr/stage/x86/lib/dll/devn-asix.so
* 3 process 3  realloc_bsd (addr=0x0, size=3072, ksp=0x80c2820, flags=3) at /export/davidr/core_networking/trunk/sys/
kern/kern_malloc.c:1120
  2 process 2  0xb0330c21 in MsgReceivev () from /export/davidr/stage/x86/lib/libc.so.2
  1 process 1  0xb0331959 in SignalWaitinfo () from /export/davidr/Oquirrh/CRS_core_os/stage/x86/lib/libc.so.2

From a brief look at the source of kern_malloc.c, the wtp variable is initialised using the WTP macro, which selects a 
worker thread from the  stk_ctl.work_threads[] structure. The tid at this point is 3, but this is offset by -2 to get 
the index into stk_ctl.work_threads[]:

(gdb) p *stk_ctl.work_threads@stk_ctl.nwork_threads
$32 = {0x80df840, 0x0, 0x0, 0x0}

Looks like only 1 worker thread (0) has been initialised.


Could anyone shed any light on this?

Thanks

Dave Rigby
Re: segfault when using devn-asix.so with io-pkt  
On Thu, Jan 17, 2008 at 10:54:32PM -0500, Dave Rigby wrote:
> Hi,
> 
> I'm trying to get an existing (binary) driver I have, asix working with io-pkt. This is a USB ethernet device, which 
was working fine with io-net. However when using it with io-pkt, I get a segfault. I managed to re-run it with a debug 
version of io-pkt, and it appears that there is a thread-local data structure which is not getting initialised correctly
.
> 
> GDB backtrace:
> 
> $ ntox86-gdb io-pkt-v4_g --core=io-pkt.core 
> GNU gdb 5.2.1qnx-nto
> <snip>
> Program terminated with signal 11, Segmentation fault.
> <snip>
> #0  realloc_bsd (addr=0x0, size=3072, ksp=0x80c2820, flags=3) at /export/davidr/core_networking/trunk/sys/kern/
kern_malloc.c:1120
> 1120			NW_SIGHOLD_P(wtp);
> (gdb) bt
> #0  realloc_bsd (addr=0x0, size=3072, ksp=0x80c2820, flags=3) at /export/davidr/core_networking/trunk/sys/kern/
kern_malloc.c:1120
> #1  0x0807d992 in malloc_bsd (size=3072, ksp=0x80c2820, flags=3) at /export/davidr/core_networking/trunk/sys/kern/
kern_malloc.c:1089
> #2  0x0805ca57 in dev_attach (drvr=0xb820615b "en", options=0x80a0c5c "", ca=0xb8207620, cfat_arg=0x7fc6ea0, single=
0x7f97d94, devp=0x7f97d98, print=0)
>     at /export/davidr/core_networking/trunk/sys/device_qnx.c:98
> #3  0xb82028c8 in shim_create_instance () from /export/davidr/stage/x86/lib/dll/devnp-shim.so
> #4  0xb8203fbc in ex_reg () from /export/davidr/stage/x86/lib/dll/devnp-shim.so
> #5  0xb8214709 in ax_register_device () from /export/davidr/stage/x86/lib/dll/devn-asix.so
> #6  0xb8214f9a in ax_attach () from /export/davidr/stage/x86/lib/dll/devn-asix.so
> #7  0xb82155fe in ax_insertion () from /export/davidr/stage/x86/lib/dll/devn-asix.so
> #8  0xb8218a1c in eventloop () from /export/davidr/stage/x86/lib/dll/devn-asix.so
>  
> The NW_SIGHOLD_P macro causes the wtp argument to be deferenced, and this is null at this point:
> 
> (gdb) info locals
> addr = (void *) 0x0
> va = (void *) 0x0
> wtp = (struct nw_work_thread *) 0x0
> size_rounded = 3072
> mcp = (struct malcheck *) 0x0
> 
> (gdb) info thread
>   4 process 4  0xb8214438 in ax_event_handler () from /export/davidr/stage/x86/lib/dll/devn-asix.so
> * 3 process 3  realloc_bsd (addr=0x0, size=3072, ksp=0x80c2820, flags=3) at /export/davidr/core_networking/trunk/sys/
kern/kern_malloc.c:1120
>   2 process 2  0xb0330c21 in MsgReceivev () from /export/davidr/stage/x86/lib/libc.so.2
>   1 process 1  0xb0331959 in SignalWaitinfo () from /export/davidr/Oquirrh/CRS_core_os/stage/x86/lib/libc.so.2
> 
> From a brief look at the source of kern_malloc.c, the wtp variable is initialised using the WTP macro, which selects a
 worker thread from the  stk_ctl.work_threads[] structure. The tid at this point is 3, but this is offset by -2 to get 
the index into stk_ctl.work_threads[]:
> 
> (gdb) p *stk_ctl.work_threads@stk_ctl.nwork_threads
> $32 = {0x80df840, 0x0, 0x0, 0x0}
> 
> Looks like only 1 worker thread (0) has been initialised.
> 
> 
> Could anyone shed any light on this?
> 
> Thanks
> 
> Dave Rigby
> 

You're run into a driver that does something atypical.  Most
drivers register with io-net (shim here) from their initial
entry point which is called by a thread io-pkt knows about.
This driver is registering as a result of an insertion event
from a thread that it created (one that io-pkt doesn't have
any knowledge of).  I don't see an easy proper fix at first
glance.  Some smarts could be added to
shim_create_instance() to fail in this case but that doesn't
really help you out.

-seanb
RE: segfault when using devn-asix.so with io-pkt  
This ties back to one of the questions that was posed during the webinar
about threads and io-pkt.  Going to require some thought....  The
USB-dongles interact with a separate thread to control insertion / removal
events.  We'll have to see how we can re-align that threading model (either
by modifying the driver or the stack...  I'm sure that Sean will say "modify
the driver" :>).

	Thanks for such a detailed debug session and problem report though.
It's extremely useful to have someone go to the effort of providing a
traceback like that.


BTW, I've filed a problem report (Ref. 54863) to track this.

	Robert.

	

-----Original Message-----
From: Sean Boudreau [mailto:seanb@qnx.com] 
Sent: Friday, January 18, 2008 10:01 AM
To: drivers-networking
Subject: Re: segfault when using devn-asix.so with io-pkt

On Thu, Jan 17, 2008 at 10:54:32PM -0500, Dave Rigby wrote:
> Hi,
> 
> I'm trying to get an existing (binary) driver I have, asix working with
io-pkt. This is a USB ethernet device, which was working fine with io-net.
However when using it with io-pkt, I get a segfault. I managed to re-run it
with a debug version of io-pkt, and it appears that there is a thread-local
data structure which is not getting initialised correctly.
> 
> GDB backtrace:
> 
> $ ntox86-gdb io-pkt-v4_g --core=io-pkt.core 
> GNU gdb 5.2.1qnx-nto
> <snip>
> Program terminated with signal 11, Segmentation fault.
> <snip>
> #0  realloc_bsd (addr=0x0, size=3072, ksp=0x80c2820, flags=3) at
/export/davidr/core_networking/trunk/sys/kern/kern_malloc.c:1120
> 1120			NW_SIGHOLD_P(wtp);
> (gdb) bt
> #0  realloc_bsd (addr=0x0, size=3072, ksp=0x80c2820, flags=3) at
/export/davidr/core_networking/trunk/sys/kern/kern_malloc.c:1120
> #1  0x0807d992 in malloc_bsd (size=3072, ksp=0x80c2820, flags=3) at
/export/davidr/core_networking/trunk/sys/kern/kern_malloc.c:1089
> #2  0x0805ca57 in dev_attach (drvr=0xb820615b "en", options=0x80a0c5c "",
ca=0xb8207620, cfat_arg=0x7fc6ea0, single=0x7f97d94, devp=0x7f97d98,
print=0)
>     at /export/davidr/core_networking/trunk/sys/device_qnx.c:98
> #3  0xb82028c8 in shim_create_instance () from
/export/davidr/stage/x86/lib/dll/devnp-shim.so
> #4  0xb8203fbc in ex_reg () from
/export/davidr/stage/x86/lib/dll/devnp-shim.so
> #5  0xb8214709 in ax_register_device () from
/export/davidr/stage/x86/lib/dll/devn-asix.so
> #6  0xb8214f9a in ax_attach () from
/export/davidr/stage/x86/lib/dll/devn-asix.so
> #7  0xb82155fe in ax_insertion () from
/export/davidr/stage/x86/lib/dll/devn-asix.so
> #8  0xb8218a1c in eventloop () from
/export/davidr/stage/x86/lib/dll/devn-asix.so
>  
> The NW_SIGHOLD_P macro causes the wtp argument to be deferenced, and this
is null at this point:
> 
> (gdb) info locals
> addr = (void *) 0x0
> va = (void *) 0x0
> wtp = (struct nw_work_thread *) 0x0
> size_rounded = 3072
> mcp = (struct malcheck *) 0x0
> 
> (gdb) info thread
>   4 process 4  0xb8214438 in ax_event_handler () from
/export/davidr/stage/x86/lib/dll/devn-asix.so
> * 3 process 3  realloc_bsd (addr=0x0, size=3072, ksp=0x80c2820, flags=3)
at /export/davidr/core_networking/trunk/sys/kern/kern_malloc.c:1120
>   2 process 2  0xb0330c21 in MsgReceivev () from
/export/davidr/stage/x86/lib/libc.so.2
>   1 process 1  0xb0331959 in SignalWaitinfo () from
/export/davidr/Oquirrh/CRS_core_os/stage/x86/lib/libc.so.2
> 
> From a brief look at the source of kern_malloc.c, the wtp variable is
initialised using the WTP macro, which selects a worker thread from the
stk_ctl.work_threads[] structure. The tid at this point is 3, but this is
offset by -2 to get the index into stk_ctl.work_threads[]:
> 
> (gdb) p *stk_ctl.work_threads@stk_ctl.nwork_threads
> $32 = {0x80df840, 0x0, 0x0, 0x0}
> 
> Looks like only 1 worker thread (0) has been...
View Full Message
Re: RE: segfault when using devn-asix.so with io-pkt  
Robert / Sean,

Thanks for the quick analysis. It is possible for me to track that problem report (54863), or is that purely a QNX 
internal thing?

Thanks

Dave Rigby


RE: RE: segfault when using devn-asix.so with io-pkt  
Hi Dave:
	Unfortunately, that's all internal at this point.  At some point in
the near future we will set up the trackers on the project so that people
also have live access to the PR data base... 

	Robert.

-----Original Message-----
From: Dave Rigby [mailto:davidr@transitive.com] 
Sent: Friday, January 18, 2008 10:43 AM
To: drivers-networking
Subject: Re: RE: segfault when using devn-asix.so with io-pkt

Robert / Sean,

Thanks for the quick analysis. It is possible for me to track that problem
report (54863), or is that purely a QNX internal thing?

Thanks

Dave Rigby




_______________________________________________
Networking Drivers
http://community.qnx.com/sf/go/post4432
Re: segfault when using devn-asix.so with io-pkt  
Hi Dave:
  After some pondering, we thought that the best way to solve the asix driver problem was to say "Don't use the io-net 
version of the driver".  We were left with the dilemma of either modifying the stack in some way to handle two specific 
drivers (asix and pegasus) OR somehow re-working the io-net drivers for io-pkt.  Neither of these options really make 
sense...  So we went with option 3 instead.

If you update your code base to the latest and re-build, you will now find a devnp-axe.so driver which is the ported 
NetBSD version.  Works on x86 but hasn't been tried on any other platforms yet. In the interests of saving other people 
pain, we also added in a new porting doc which covers what had to be done to get the axe ported over to Neutrino.  
Hopefully this will be pretty much cookie cutter now that it's been done once. 

http://community.qnx.com/integration/viewcvs/viewcvs.cgi/trunk/sys/dev/doc/?root=core_networking&system=exsy1001

Let me know how it works out,

    Robert.
Re: segfault when using devn-asix.so with io-pkt  
The problem with devn-asix.so seems to be associated only with device insertion.  If the device is already inserted 
before starting the stack, then everything runs as expected.  If the device is subsequently removed and then re-inserted
, this causes the stack to crash.   Can you confirm that this is the same behaviour that you're seeing?  We might be 
able to work around this in the stack.

    Thanks!
Re: segfault when using devn-asix.so with io-pkt  
Oops.  Meant to say "shim" rather than "stack"...
Re: segfault when using devn-asix.so with io-pkt  
On 4 Feb 2008, at 16:26, Robert Craig wrote:

> The problem with devn-asix.so seems to be associated only with  
> device insertion.  If the device is already inserted before starting  
> the stack, then everything runs as expected.  If the device is  
> subsequently removed and then re-inserted, this causes the stack to  
> crash.   Can you confirm that this is the same behaviour that you're  
> seeing?  We might be able to work around this in the stack.
>

Nope - this wasn't the behaviour I was seeing. The device was always  
attached; I was starting both the usb stack and io-pkt stack in a  
startup script, something like:

#Crank up the USB 2.0 stack
io-usb -dehci pindex=1
io-pkt-v4 -dasix -ptcpip

(note that's not the exact syntax, I've reverted to io-net for the  
time being)

Dave





RE: segfault when using devn-asix.so with io-pkt  
Hmmm... That's an interesting one.  I wonder if there's a race condition on
startup where the usb stack isn't quite up and running before the IP stack
is ready with the result being that the insertion event comes after the IP
stack has tried the driver.  One thing that could be worthwhile trying is to
make sure that usb is fully up and running before io-pkt starts.  Adding
something like "waitfor /dev/io-usb" before starting io-pkt might help...

Out of curiosity, could you also try executing the io-pkt command line from
the shell after the startup has finished? This would help to confirm if it's
a race condition.

	Robert.

-----Original Message-----
From: Dave Rigby [mailto:davidr@transitive.com] 
Sent: Monday, February 04, 2008 11:38 AM
To: drivers-networking
Subject: Re: segfault when using devn-asix.so with io-pkt


On 4 Feb 2008, at 16:26, Robert Craig wrote:

> The problem with devn-asix.so seems to be associated only with  
> device insertion.  If the device is already inserted before starting  
> the stack, then everything runs as expected.  If the device is  
> subsequently removed and then re-inserted, this causes the stack to  
> crash.   Can you confirm that this is the same behaviour that you're  
> seeing?  We might be able to work around this in the stack.
>

Nope - this wasn't the behaviour I was seeing. The device was always  
attached; I was starting both the usb stack and io-pkt stack in a  
startup script, something like:

#Crank up the USB 2.0 stack
io-usb -dehci pindex=1
io-pkt-v4 -dasix -ptcpip

(note that's not the exact syntax, I've reverted to io-net for the  
time being)

Dave







_______________________________________________
Networking Drivers
http://community.qnx.com/sf/go/post4772