Gary Faulkner
|
possible bug in kern_sync.c
|
Gary Faulkner
03/06/2008 9:26 AM
post5579
|
possible bug in kern_sync.c
Hello:
First, I'm not 100% sure this is a bug; but it's definitely a problem for me. I'm also fairly convinced that my "fix"
is incorrect; but it works for me (for now).
At any rate, here's the situation. I "ported" a driver for the MOSCHIP 7830 USB ethernet part to "NetBSD". What I
really did was to take the example if_axe driver and modify it according to the OpenBSD driver that MOSCHIP provides.
At any rate, if I forced autonegotiation (by not connecting an ethernet cable), the stack would fail. By looking at the
dump, I found it was failing in endtsleep() (in kern_sync.c).
Specifically, it was failing here:
/* take out of fp->f_slpq */
if ((*l->l_fslback = l->l_fslforw) != NULL)
l->l_fslforw->l_fslback = l->l_fslback;
because l->l_fslback was null.
The reason it was null was caused by the following code in ltsleep():
if (l->l_fp) {
/*
* Put it on this list to make the handling of
* unblock pulses faster. The unblock callout
* gives us a reference to p_fp. Otherwise, we
* have to search all the slpqs.
*/
l->l_fslforw = l->l_fp->f_slpq;
if (l->l_fslforw != NULL)
l->l_fslforw->l_fslback = &l->l_fslforw;
l->l_fp->f_slpq = l;
l->l_fslback = &l->l_fp->f_slpq;
}
else {
/*
* Sleeping during connect message handling.
* This can happen if a dlopen() results while
* servicing a mount request: dlopen() gets mapped
* to a blockop.
*
* Note: The resmgr layer will never call out to our
* tcpip_unblock() func prior to the resmgr_open_bind()
* so we better make sure this blocking state doesn't
* linger.
*/
l->l_fslback = NULL;
l->l_fslforw = NULL;
}
Specifically the last two lines. In my case, at this time, timo was non-zero, so the following was called:
callout_reset(&l->l_tsleep_ch, timo, endtsleep, l);
Now, my temporary fix for this is to set timo = in the case where we are setting l->l_fslback to NULL.
I don't necessarily believe this is a good thing, but it works (at least for me), and the stack seems stable (granted, I
have not had the opportunity yet to transfer very much data across it).
My questions are:
1) Is this the right fix for this bug? or is there something else that should be going on here?
2) If it's not the right fix, what is the right fix?
Thanks!!
-garyf
|
|
|
Robert Craig
|
RE: possible bug in kern_sync.c
|
Robert Craig
03/07/2008 2:08 PM
post5628
|
RE: possible bug in kern_sync.c
Hi Gary:
I've reproduced that here using the axe driver, so we'll dig in and
see what's going on.
Just to check this seems to require the following:
Plug in dongle (no cable)
io-pkt-v4 -daxe
ifconfig axe0 up
Now plug in cable (crash).
Seems to be somewhat intermittent, but it is reproducible.
Thanks very much for the report,
Robert.
-----Original Message-----
From: Gary Faulkner [mailto:gary.faulkner@ucontrol.com]
Sent: Thursday, March 06, 2008 9:27 AM
To: builds-networking
Subject: possible bug in kern_sync.c
Hello:
First, I'm not 100% sure this is a bug; but it's definitely a problem for
me. I'm also fairly convinced that my "fix" is incorrect; but it works for
me (for now).
At any rate, here's the situation. I "ported" a driver for the MOSCHIP 7830
USB ethernet part to "NetBSD". What I really did was to take the example
if_axe driver and modify it according to the OpenBSD driver that MOSCHIP
provides.
At any rate, if I forced autonegotiation (by not connecting an ethernet
cable), the stack would fail. By looking at the dump, I found it was
failing in endtsleep() (in kern_sync.c).
Specifically, it was failing here:
/* take out of fp->f_slpq */
if ((*l->l_fslback = l->l_fslforw) != NULL)
l->l_fslforw->l_fslback = l->l_fslback;
because l->l_fslback was null.
The reason it was null was caused by the following code in ltsleep():
if (l->l_fp) {
/*
* Put it on this list to make the handling of
* unblock pulses faster. The unblock callout
* gives us a reference to p_fp. Otherwise, we
* have to search all the slpqs.
*/
l->l_fslforw = l->l_fp->f_slpq;
if (l->l_fslforw != NULL)
l->l_fslforw->l_fslback = &l->l_fslforw;
l->l_fp->f_slpq = l;
l->l_fslback = &l->l_fp->f_slpq;
}
else {
/*
* Sleeping during connect message handling.
* This can happen if a dlopen() results while
* servicing a mount request: dlopen() gets mapped
* to a blockop.
*
* Note: The resmgr layer will never call out to our
* tcpip_unblock() func prior to the resmgr_open_bind()
* so we better make sure this blocking state doesn't
* linger.
*/
l->l_fslback = NULL;
l->l_fslforw = NULL;
}
Specifically the last two lines. In my case, at this time, timo was
non-zero, so the following was called:
callout_reset(&l->l_tsleep_ch, timo, endtsleep, l);
Now, my temporary fix for this is to set timo = in the case where we are
setting l->l_fslback to NULL.
I don't necessarily believe this is a good thing, but it works (at least for
me), and the stack seems stable (granted, I have not had the opportunity yet
to transfer very much data across it).
My questions are:
1) Is this the right fix for this bug? or is there something else that
should be going on here?
2) If it's not the right fix, what is the right fix?
Thanks!!
-garyf
_______________________________________________
Builds
http://community.qnx.com/sf/go/post5579
|
|
|
Robert Craig
|
Re: RE: possible bug in kern_sync.c
|
Robert Craig
03/07/2008 2:11 PM
post5629
|
Re: RE: possible bug in kern_sync.c
Just as a side note, I've create a PR to track this (56077) if you're looking for an update at any point in time.
Robert.
|
|
|
Sean Boudreau(deleted)
|
Re: possible bug in kern_sync.c
|
Sean Boudreau(deleted)
03/07/2008 5:17 PM
post5635
|
Re: possible bug in kern_sync.c
This was probably introduced in rev 210 of
sys/dev/lib/usb/usb_conv_qnx.c when
static struct file usb_file; was removed.
I think the proper fix is to check for NULL
l_fslback and friends members on wakeup but
I'll look into it more thoroughly next week
when I have hardware.
-seanb
On Fri, Mar 07, 2008 at 02:08:27PM -0500, Robert Craig wrote:
> Hi Gary:
>
> I've reproduced that here using the axe driver, so we'll dig in and
> see what's going on.
>
> Just to check this seems to require the following:
>
> Plug in dongle (no cable)
>
> io-pkt-v4 -daxe
> ifconfig axe0 up
>
> Now plug in cable (crash).
>
> Seems to be somewhat intermittent, but it is reproducible.
>
> Thanks very much for the report,
> Robert.
>
>
> -----Original Message-----
> From: Gary Faulkner [mailto:gary.faulkner@ucontrol.com]
> Sent: Thursday, March 06, 2008 9:27 AM
> To: builds-networking
> Subject: possible bug in kern_sync.c
>
> Hello:
>
> First, I'm not 100% sure this is a bug; but it's definitely a problem for
> me. I'm also fairly convinced that my "fix" is incorrect; but it works for
> me (for now).
>
> At any rate, here's the situation. I "ported" a driver for the MOSCHIP 7830
> USB ethernet part to "NetBSD". What I really did was to take the example
> if_axe driver and modify it according to the OpenBSD driver that MOSCHIP
> provides.
>
> At any rate, if I forced autonegotiation (by not connecting an ethernet
> cable), the stack would fail. By looking at the dump, I found it was
> failing in endtsleep() (in kern_sync.c).
>
> Specifically, it was failing here:
>
> /* take out of fp->f_slpq */
> if ((*l->l_fslback = l->l_fslforw) != NULL)
> l->l_fslforw->l_fslback = l->l_fslback;
>
> because l->l_fslback was null.
>
> The reason it was null was caused by the following code in ltsleep():
>
> if (l->l_fp) {
> /*
> * Put it on this list to make the handling of
> * unblock pulses faster. The unblock callout
> * gives us a reference to p_fp. Otherwise, we
> * have to search all the slpqs.
> */
> l->l_fslforw = l->l_fp->f_slpq;
> if (l->l_fslforw != NULL)
> l->l_fslforw->l_fslback = &l->l_fslforw;
> l->l_fp->f_slpq = l;
> l->l_fslback = &l->l_fp->f_slpq;
> }
> else {
> /*
> * Sleeping during connect message handling.
> * This can happen if a dlopen() results while
> * servicing a mount request: dlopen() gets mapped
> * to a blockop.
> *
> * Note: The resmgr layer will never call out to our
> * tcpip_unblock() func prior to the resmgr_open_bind()
> * so we better make sure this blocking state doesn't
> * linger.
> */
> l->l_fslback = NULL;
> l->l_fslforw = NULL;
> }
>
> Specifically the last two lines. In my case, at this time, timo was
> non-zero, so the following was called:
>
> callout_reset(&l->l_tsleep_ch, timo, endtsleep, l);
>
> Now, my temporary fix for this is to set timo = in the case where we are
> setting l->l_fslback to NULL.
>
> I don't necessarily believe this is a good thing, but it works (at least for
> me), and the stack seems stable (granted, I have not had the opportunity yet
> to transfer very much data across it).
>
> My questions are:
>
> 1) Is this the right fix for this bug? or is there something else that
> should be going on here?
>
> 2) If it's not the right fix, what is the...
|
|
|
Gary Faulkner
|
Re: possible bug in kern_sync.c
|
Gary Faulkner
03/10/2008 2:15 PM
post5658
|
Re: possible bug in kern_sync.c
Thanks Sean! I had thought about that solution; but wasn't sure of all of the ramifications (I did not thoroughly
understand all of the intricacies of the thread sleeping models being used at this point in the code).
FWIW, I am willing to try other fixes if it will help (as I'm still relatively early in product development at this
point).
Thanks!!
-garyf
>
> This was probably introduced in rev 210 of
> sys/dev/lib/usb/usb_conv_qnx.c when
> static struct file usb_file; was removed.
> I think the proper fix is to check for NULL
> l_fslback and friends members on wakeup but
> I'll look into it more thoroughly next week
> when I have hardware.
>
> -seanb
>
|
|
|
Sean Boudreau(deleted)
|
Re: possible bug in kern_sync.c
|
Sean Boudreau(deleted)
04/07/2008 11:18 AM
post6491
|
Re: possible bug in kern_sync.c
On Mon, Mar 10, 2008 at 02:15:25PM -0400, Gary Faulkner wrote:
> Thanks Sean! I had thought about that solution; but wasn't sure of all of the ramifications (I did not thoroughly
understand all of the intricacies of the thread sleeping models being used at this point in the code).
>
> FWIW, I am willing to try other fixes if it will help (as I'm still relatively early in product development at this
point).
>
> Thanks!!
>
> -garyf
Should be fixed now.
-seanb
|
|
|
Gary Faulkner
|
Re: RE: possible bug in kern_sync.c
|
Gary Faulkner
03/10/2008 2:13 PM
post5656
|
Re: RE: possible bug in kern_sync.c
Hi Robert:
In my case, I did not have to plug the cable in for it to fail; it would just eventually crash (after a bunch of auto-
negotiation tried to happen). This *could* be due to differences between my actual driver (mos) and the axe driver; but
sounds like you've got the jist of the idea from what you've done.
Thanks for looking into this!!
-garyf
> Hi Gary:
>
> I've reproduced that here using the axe driver, so we'll dig in and
> see what's going on.
>
> Just to check this seems to require the following:
>
> Plug in dongle (no cable)
>
> io-pkt-v4 -daxe
> ifconfig axe0 up
>
> Now plug in cable (crash).
>
> Seems to be somewhat intermittent, but it is reproducible.
>
> Thanks very much for the report,
> Robert.
|
|
|
|