Jongpil Won
10/13/2008 5:12 AM
post14905
|
I have MPC5200 reference board from Freescale.
On the board, I configured my board as a WAP(Wireless Access Point).
I followed the description for the configuration from WiKi page on this site.
I have two network interfaces. One is ethernet port is built on the reference board, the other is wireless lan adaptor
from 3Com.
The wireless adaptor used Atheros chipset and has a PCI form factor.
And this is the script for configuration of WAP.
io-pkt-v4-hc -dath -dmpc5200 mac=00049F005B27 verbose -ptcpip
waitfor /dev/io-net/en0 10
ifconfig en0 150.150.74.87 netmask 255.255.255.0 up
ifconfig ath0 192.168.0.111 netmask 255.255.255.0 up
ifconfig ath0 media OFDM54 mode 11a mediaopt hostap
ifconfig ath0 ssid AGW_A
sysctl -w net.inet.ip.forwarding=1
mount -Ttcpip lsm-pf-v4.so
pfctl -e
pfctl -N -f /etc/pf.conf
route add default 150.150.74.254
We got a driver source for Atheros from QNX.
I confirmed that there is no added component, software and hardware at the reference board.
The ethernet port is used for external network and WiFi interface is used for internal network.
There is one notebook. It has only wifi interface to the external network. It got IP address from the WAP, because WAP
acted as a DHCP server.
The reference board had NAT function. So the wifi interface had a private network address like 192.168.0.111.
Now I started to explain my problem regarding io-pkt stack is down.
I tried to connect some web page from Notebook through the reference board via WiFi interface.
After few seconds or few minutes, io-pkt stack had been hang.
So, web connection is closed.
At this time, I checked process lists using "pidin" command, and got a result like below. There are two types of errors.
First, the state of io-pkt-v4-hc is MUTEX. And the second is there is no list of io-pkt.
1. # pidin
pid tid name prio STATE Blocked
1 1 procnto 0f READY
1 2 procnto 255r RECEIVE 1
1 3 procnto 255r RECEIVE 1
1 4 procnto 10r RECEIVE 1
1 5 procnto 10r RUNNING
1 6 procnto 10r RECEIVE 1
1 7 procnto 10r RECEIVE 1
4098 1 c/boot/devc-serpsc 10r RECEIVE 1
4099 1 proc/boot/pipe 10r SIGWAITINFO
4099 2 proc/boot/pipe 10r RECEIVE 1
4099 3 proc/boot/pipe 10r RECEIVE 1
4099 4 proc/boot/pipe 10r RECEIVE 1
4100 1 c/boot/pci-mgt5200 21r RECEIVE 1
4101 1 proc/boot/devc-pty 10r RECEIVE 1
4102 1 proc/boot/qconn 10r SIGWAITINFO
4102 2 proc/boot/qconn 10r CONDVAR 48060168
4102 3 proc/boot/qconn 10r RECEIVE 1
4102 4 proc/boot/qconn 10r RECEIVE 3
4103 1 /boot/devf-mgt5200 10r SIGWAITINFO
4103 2 /boot/devf-mgt5200 10r RECEIVE 1
4103 3 /boot/devf-mgt5200 10r RECEIVE 1
4105 1 proc/boot/io-usb 10r SIGWAITINFO
4105 2 proc/boot/io-usb 21r RECEIVE 4
4105 3 proc/boot/io-usb 21r RECEIVE 1
4105 4 proc/boot/io-usb 10r RECEIVE 7
4105 5 proc/boot/io-usb 10r NANOSLEEP
4105 6 proc/boot/io-usb 10r RECEIVE 7
8200 1 c/boot/devc-serpsc 10r RECEIVE 1
8202 1 proc/boot/ksh 10r SIGSUSPEND
8203 1 usr/sbin/random 10r SIGWAITINFO
8203 2 usr/sbin/random 10r RECEIVE 1 ...
View Full Message
|
|
|
Robert Craig
10/13/2008 2:02 PM
post14919
|
Just to confirm: The driver has been completely re-built within the new io-pkt source base?
Can you try running with lsm-pf-v4 and see if that makes any difference (or can things not operate at all without NAT?
)
Is it possible to set up secondary ethernet interface so that you can run a second instance of the stack and then run a
debug version of io-pkt so that you can see exactly where things are crashing?
Is dumper running? Can you get us a core dump in the case the stack dies?
We're dealing with four potential problems here:
1) The stack
2) The filter
3) The Ethernet driver
4) The Atheros driver
Have there been any customized changes to either the Ethernet driver or Atheros driver?
Robert
|
|
|
Jongpil Won
10/13/2008 10:49 PM
post14924
|
I confimed that the driver has been completely re-built within the new io-pkt source base. When the source code for io-
pkt is released, I always build the whole source code with atheros driver source code.
And I'll upload a core dump when the stack dies after working hours.
I couldn't upload a file at an office, so I'll come back soon at home.
I totally agree with you there are four potential issues.
So, first I'll try to re-test without packet filter and NAT and I'll share the result.
FYI, currently I used very simple pf.conf. There are only three lines.
I defined int_if=ath0 and ext_if=eth0. And I set the nat rule as "nat on $ext_if from !(ext_if) to any -> ($ext_if)". I
think it's enough for our purpose.
After tesing without NAT, I'll try to use secondary ethernet PCI card on Reference board. Did you mean that setting up
two network interfaces using two ethernet interfaces? Did you mean that one is built in ethernet port and the other is
external ethernet port or one is an external ethernet port and the other is Wireless lan port?
Finally, I confirmed that there are no changes at ethernet driver and wireless lan for atheros chipset driver.
I only use the io-pkt source code and atheros driver source code from QNX site without any customizing.
Wayne.
|
|
|
Robert Craig
10/14/2008 11:30 AM
post14962
|
Hi Wayne:
I was thinking that if you have another Ethernet interface (an
additional on (PCI or USB or on board)), then you can use that interface
with a second stack instantiation to connect into with the debugger to
start a debug version of the stack and driver with the configuration
that you have today. You can also use dumper to get back traces of
io-pkt when it appears to be mutex deadlocked (see the dumper -p
option).
Robert.
-----Original Message-----
From: Jongpil Won [mailto:community-noreply@qnx.com]
Sent: Monday, October 13, 2008 10:49 PM
To: drivers-networking
Subject: Re: io-pkt is down
I confimed that the driver has been completely re-built within the new
io-pkt source base. When the source code for io-pkt is released, I
always build the whole source code with atheros driver source code.
And I'll upload a core dump when the stack dies after working hours.
I couldn't upload a file at an office, so I'll come back soon at home.
I totally agree with you there are four potential issues.
So, first I'll try to re-test without packet filter and NAT and I'll
share the result.
FYI, currently I used very simple pf.conf. There are only three lines.
I defined int_if=ath0 and ext_if=eth0. And I set the nat rule as "nat on
$ext_if from !(ext_if) to any -> ($ext_if)". I think it's enough for our
purpose.
After tesing without NAT, I'll try to use secondary ethernet PCI card on
Reference board. Did you mean that setting up two network interfaces
using two ethernet interfaces? Did you mean that one is built in
ethernet port and the other is external ethernet port or one is an
external ethernet port and the other is Wireless lan port?
Finally, I confirmed that there are no changes at ethernet driver and
wireless lan for atheros chipset driver.
I only use the io-pkt source code and atheros driver source code from
QNX site without any customizing.
Wayne.
_______________________________________________
Networking Drivers
http://community.qnx.com/sf/go/post14924
|
|
|
Andrew Boyd(deleted)
10/14/2008 12:26 PM
post14969
|
> io-pkt stack had been hang ... two types of errors.
> First, the state of io-pkt-v4-hc is MUTEX.
> the second is there is no list of io-pkt.
When io-pkt goes away, it probably has faulted.
Before you start io-pkt, run dumper to create a
core dump of io-pkt in /tmp.
When io-pkt locks up on a mutex, do a pidin to
get the pid of io-net (say 12345) then do this:
# dumper -p 12345
which will cause dumper to create a core file of
io-pkt, even though it didn't fault.
Now that you have a core dump, start io-pkt really
simply - without the wifi - just with a single ethernet
port, and ifconfig the port with an IP address, so
that you can ftp the core file produced by dumper
off the box, and send it to us.
--
aboyd
|
|
|
Sunha Choi
10/15/2008 8:14 AM
post14995
|
Hi, i'm a same team member with Wayne.
i send two core dump files.
1. io-pkt stack down (pidin : there is no list of io-pkt) : io-pkt-v4-hc_stackdown.core
2. io-pkt mutex state : io-pkt-v4-hc_mutex.core
Thank you.
|
|
|
Sean Boudreau(deleted)
10/15/2008 8:35 AM
post15001
|
On Wed, Oct 15, 2008 at 08:16:56AM -0400, Sunha Choi wrote:
> Hi, i'm a same team member with Wayne.
> i send two core dump files.
>
> 1. io-pkt stack down (pidin : there is no list of io-pkt) : io-pkt-v4-hc_stackdown.core
> 2. io-pkt mutex state : io-pkt-v4-hc_mutex.core
>
> Thank you.
>
Can you also include the binaries in the archive (io-pkt
and any dlls loaded therein (devnp-ath.so, lsm-*)).
Thanks,
-seanb
|
|
|
Sunha Choi
10/15/2008 10:33 PM
post15057
|
I send the binaries (io-pkt & dlls).
Thank you.
|
|
|
Andrew Boyd(deleted)
10/16/2008 10:09 AM
post15081
|
Ok, got all the binaries. Wrestling with
gdb right now.
--
aboyd
|
|
|
Andrew Boyd(deleted)
10/16/2008 10:22 AM
post15085
|
Do you have a non-stripped io-pkt-v4-hc
that you can send me?
--
aboyd
|
|
|
Sunha Choi
|
Re: RE: RE: io-pkt is down
|
Sunha Choi
10/18/2008 3:07 PM
post15204
|
Re: RE: RE: io-pkt is down
Is io-pkt-v4-hc stripped?
I resend the files.
Thanks a lot~!
|
|
|
Andrew Boyd(deleted)
|
RE: RE: RE: io-pkt is down
|
Andrew Boyd(deleted)
10/20/2008 11:43 AM
post15246
|
RE: RE: RE: io-pkt is down
The unstripped binaries work MUCH better with gdb,
but I should mention it is still complaining about
the following binaries missing:
devn-mpc5200.so
libdma-bestcomm5200.so.1
Now, when I look at the mutex-blocked core of
io-pkt (that you snapshotted with dumper -p 123)
I can see:
Thread 1 is blocked in SignalWaitInfo (looks ok)
Thread 2 rxd an IP packet, and is attempting to
forward (transmit) the packet, but is mutex-blocked
in shim_start() (after ether_output())
Thread 3 faulted someplace mysterious (no symbols,
just question marks), the SIGSEGV handler ran to
try to quiece the hardware, and is mutex blocked.
It would appear that the fault occurred in the missing
binaries (above) and the mutex blocking is simply a
result of excessively complicated shutdown code in
the driver.
Now, looking at the actual core dump of io-pkt, it
looks like io-pkt for some reason was already in
the process of orderly shutdown - ie it received
a message, then called dodie(), then doshutdownhooks()
then shim_shutdown(), then a few frames with no
symbols (likely 5200 binaries above) which appeared
to fault and end up in the sigsegv handler.
It sure looks like the 5200 driver has some bugs. If
You want to send me the missing (unstripped!) binaries
above, I can probably gather a little bit more information,
but in this situation, what really really helps is
to compile the offending binary -g -O0 so that we
can narrow down the fault to an exact line of code
instead of just a function.
I am unfamiliar with these 5200 binaries - did you
compile your own versions, or did you get them from
QNX?
--
aboyd
|
|
|
Sunha Choi
|
Re: RE: RE: RE: io-pkt is down
|
Sunha Choi
10/21/2008 3:43 AM
post15302
|
Re: RE: RE: RE: io-pkt is down
Hi Andrew.
I got Lite5200B BSP 1.0.4 from QNX and I compiled unmodified devn-mpc5200 source code.
I also used prebuilt 5200 binaries, but the result is same.
Thank you.
|
|
|
Sunha Choi
|
Re: RE: RE: RE: io-pkt is down
|
Sunha Choi
10/21/2008 6:43 AM
post15307
|
Re: RE: RE: RE: io-pkt is down
I upload two files (devn-mpc5200.so, libdma-bestcomm5200.so.1).
Thanks.
|
|
|
Andrew Boyd(deleted)
|
RE: RE: RE: RE: io-pkt is down
|
Andrew Boyd(deleted)
10/27/2008 10:17 AM
post15556
|
RE: RE: RE: RE: io-pkt is down
> upload two files (devn-mpc5200.so, libdma-bestcomm5200.so.1)
Thanks for that. Sorry about the delayed response, I was
away last week, and sick this weekend.
With all the binaries, we can see that both problems
(mutex lock and stack fault) have the same root cause:
line 547 of io-net.c of the shim is attempting to deference
a null pointer (shim). This is a result of either:
1) the npkt->flags being corrupted (eg multiple, conflicting
definitions), or
2) the value of the shim pointer (which was squirrelled
away previously by the shim) being overwritten with zero, or
3) the ex_tx_done() function being passed a pointer to
an npkt that it wasn't expecting (ie it did not create the npkt).
I'll see if I can pore through the core dump some more, to
try to narrow it down, but I thought I'd give you an update.
Executive summary: looks like some subtle incompatibility
between the shim (which emulates io-net) and the mpc5200
io-net driver.
--
aboyd
|
|
|
Robert Craig
|
RE: RE: RE: RE: io-pkt is down
|
Robert Craig
10/27/2008 10:59 AM
post15561
|
RE: RE: RE: RE: io-pkt is down
And there is an option 4). The io-net driver has a bug in it that
wasn't being hit in io-net for some reason (which we've seen several
times already).
Robert.
-----Original Message-----
From: Andrew Boyd [mailto:community-noreply@qnx.com]
Sent: Monday, October 27, 2008 10:18 AM
To: drivers-networking
Subject: RE: RE: RE: RE: io-pkt is down
> upload two files (devn-mpc5200.so, libdma-bestcomm5200.so.1)
Thanks for that. Sorry about the delayed response, I was
away last week, and sick this weekend.
With all the binaries, we can see that both problems
(mutex lock and stack fault) have the same root cause:
line 547 of io-net.c of the shim is attempting to deference
a null pointer (shim). This is a result of either:
1) the npkt->flags being corrupted (eg multiple, conflicting
definitions), or
2) the value of the shim pointer (which was squirrelled
away previously by the shim) being overwritten with zero, or
3) the ex_tx_done() function being passed a pointer to
an npkt that it wasn't expecting (ie it did not create the npkt).
I'll see if I can pore through the core dump some more, to
try to narrow it down, but I thought I'd give you an update.
Executive summary: looks like some subtle incompatibility
between the shim (which emulates io-net) and the mpc5200
io-net driver.
--
aboyd
_______________________________________________
Networking Drivers
http://community.qnx.com/sf/go/post15556
|
|
|
Andrew Boyd(deleted)
|
Re: RE: RE: RE: RE: io-pkt is down
|
Andrew Boyd(deleted)
10/27/2008 2:39 PM
post15588
|
Re: RE: RE: RE: RE: io-pkt is down
Suggestion: from the binaries, it looks like you are
compiling your own bsp devn-mpc5200.so ... on line
41 of transmit.c change
dpkt->flags = FEC_DEFRAG_PACKET;
to
dpkt->flags |= FEC_DEFRAG_PACKET;
Instead of setting the bits, "or" them in, because
I think it is erroneously overwriting the _NPKT_UP
flag set by ex_alloc_up_pkt() in the shim.
And when the shim looks for the _NPKT_UP bit set
in the npkt flags later, in ex_tx_done(), it isn't
seeing that bit set, and it's resetting the shim
pointer to garbage (null) and faulting when it
tries to de-reference it.
It's this sort of driver bug that drives us nuts in
io-pkt - io-net may tolerate it, but the shim doesn't.
|
|
|
|