Project Home
Project Home
Wiki
Wiki
Discussion Forums
Discussions
Project Information
Project Info
Forum Topic - io-pkt is down: (17 Items)
   
io-pkt is down  
I have MPC5200 reference board from Freescale.
On the board, I configured my board as a WAP(Wireless Access Point).
I followed the description for the configuration from WiKi page on this site.

I have two network interfaces. One is ethernet port is built on the reference board, the other is wireless lan adaptor 
from 3Com.
The wireless adaptor used Atheros chipset and has a PCI form factor.

And this is the script for configuration of WAP.

io-pkt-v4-hc -dath -dmpc5200 mac=00049F005B27 verbose -ptcpip
waitfor /dev/io-net/en0 10
ifconfig en0 150.150.74.87 netmask 255.255.255.0 up                
ifconfig ath0 192.168.0.111 netmask 255.255.255.0 up
ifconfig ath0 media OFDM54 mode 11a mediaopt hostap
ifconfig ath0 ssid AGW_A
sysctl -w net.inet.ip.forwarding=1
mount -Ttcpip lsm-pf-v4.so
pfctl -e
pfctl -N -f /etc/pf.conf
route add default 150.150.74.254

We got a driver source for Atheros from QNX. 
I confirmed that there is no added component, software and hardware at the reference board.

The ethernet port is used for external network and WiFi interface is used for internal network. 

There is one notebook. It has only wifi interface to the external network. It got IP address from the WAP, because WAP 
acted as a DHCP server.

The reference board had NAT function. So the wifi interface had a private network address like 192.168.0.111.

Now I started to explain my problem regarding io-pkt stack is down.
I tried to connect some web page from Notebook through the reference board via WiFi interface.

After few seconds or few minutes, io-pkt stack had been hang.
So, web connection is closed.
At this time, I checked process lists using "pidin" command, and got a result like below. There are two types of errors.


First, the state of io-pkt-v4-hc is MUTEX. And the second is there is no list of io-pkt.
1. # pidin
     pid tid name               prio STATE       Blocked         
       1   1 procnto              0f READY                       
       1   2 procnto            255r RECEIVE     1               
       1   3 procnto            255r RECEIVE     1               
       1   4 procnto             10r RECEIVE     1               
       1   5 procnto             10r RUNNING                     
       1   6 procnto             10r RECEIVE     1               
       1   7 procnto             10r RECEIVE     1               
    4098   1 c/boot/devc-serpsc  10r RECEIVE     1               
    4099   1 proc/boot/pipe      10r SIGWAITINFO                 
    4099   2 proc/boot/pipe      10r RECEIVE     1               
    4099   3 proc/boot/pipe      10r RECEIVE     1               
    4099   4 proc/boot/pipe      10r RECEIVE     1               
    4100   1 c/boot/pci-mgt5200  21r RECEIVE     1               
    4101   1 proc/boot/devc-pty  10r RECEIVE     1               
    4102   1 proc/boot/qconn     10r SIGWAITINFO                 
    4102   2 proc/boot/qconn     10r CONDVAR     48060168        
    4102   3 proc/boot/qconn     10r RECEIVE     1               
    4102   4 proc/boot/qconn     10r RECEIVE     3               
    4103   1 /boot/devf-mgt5200  10r SIGWAITINFO                 
    4103   2 /boot/devf-mgt5200  10r RECEIVE     1               
    4103   3 /boot/devf-mgt5200  10r RECEIVE     1               
    4105   1 proc/boot/io-usb    10r SIGWAITINFO                 
    4105   2 proc/boot/io-usb    21r RECEIVE     4               
    4105   3 proc/boot/io-usb    21r RECEIVE     1               
    4105   4 proc/boot/io-usb    10r RECEIVE     7               
    4105   5 proc/boot/io-usb    10r NANOSLEEP                   
    4105   6 proc/boot/io-usb    10r RECEIVE     7               
    8200   1 c/boot/devc-serpsc  10r RECEIVE     1               
    8202   1 proc/boot/ksh       10r SIGSUSPEND                  
    8203   1 usr/sbin/random     10r SIGWAITINFO                 
    8203   2 usr/sbin/random     10r RECEIVE     1  ...
View Full Message
Re: io-pkt is down  
Just to confirm:  The driver has been completely re-built within the new io-pkt source base?

  Can you try running with lsm-pf-v4 and see if that makes any difference (or can things not operate at all without NAT?
)

Is it possible to set up secondary ethernet interface so that you can run a second instance of the stack and then run a 
debug version of io-pkt so that you can see exactly where things are crashing?

Is dumper running?  Can you get us a core dump in the case the stack dies?

We're dealing with four potential problems here:
1) The stack
2) The filter
3) The Ethernet driver
4) The Atheros driver


Have there been any customized changes to either the Ethernet driver or Atheros driver?

  Robert
Re: io-pkt is down  
I confimed that the driver has been completely re-built within the new io-pkt source base. When the source code for io-
pkt is released, I always build the whole source code with atheros driver source code.

And I'll upload a core dump when the stack dies after working hours.
I couldn't upload a file at an office, so I'll come back soon at home.

I totally agree with you there are four potential issues.
So, first I'll try to re-test without packet filter and NAT and I'll share the result.

FYI, currently I used very simple pf.conf. There are only three lines.
I defined int_if=ath0 and ext_if=eth0. And I set the nat rule as "nat on $ext_if from !(ext_if) to any -> ($ext_if)". I 
think it's enough for our purpose.

After tesing without NAT, I'll try to use secondary ethernet PCI card on Reference board. Did you mean that setting up 
two network interfaces using two ethernet interfaces? Did you mean that one is built in ethernet port and the other is 
external ethernet port or one is an external ethernet port and the other is Wireless lan port?

Finally, I confirmed that there are no changes at ethernet driver and wireless lan for atheros chipset driver.

I only use the io-pkt source code and atheros driver source code from QNX site without any customizing.

Wayne.
RE: io-pkt is down  
Hi Wayne:
	I was thinking that if you have another Ethernet interface (an
additional on (PCI or USB or on board)), then you can use that interface
with a second stack instantiation to connect into with the debugger to
start a debug version of the stack and driver with the configuration
that you have today.  You can also use dumper to get back traces of
io-pkt when it appears to be mutex deadlocked (see the dumper -p
option).

	Robert.

-----Original Message-----
From: Jongpil Won [mailto:community-noreply@qnx.com] 
Sent: Monday, October 13, 2008 10:49 PM
To: drivers-networking
Subject: Re: io-pkt is down

I confimed that the driver has been completely re-built within the new
io-pkt source base. When the source code for io-pkt is released, I
always build the whole source code with atheros driver source code.

And I'll upload a core dump when the stack dies after working hours.
I couldn't upload a file at an office, so I'll come back soon at home.

I totally agree with you there are four potential issues.
So, first I'll try to re-test without packet filter and NAT and I'll
share the result.

FYI, currently I used very simple pf.conf. There are only three lines.
I defined int_if=ath0 and ext_if=eth0. And I set the nat rule as "nat on
$ext_if from !(ext_if) to any -> ($ext_if)". I think it's enough for our
purpose.

After tesing without NAT, I'll try to use secondary ethernet PCI card on
Reference board. Did you mean that setting up two network interfaces
using two ethernet interfaces? Did you mean that one is built in
ethernet port and the other is external ethernet port or one is an
external ethernet port and the other is Wireless lan port?

Finally, I confirmed that there are no changes at ethernet driver and
wireless lan for atheros chipset driver.

I only use the io-pkt source code and atheros driver source code from
QNX site without any customizing.

Wayne.

_______________________________________________
Networking Drivers
http://community.qnx.com/sf/go/post14924
RE: io-pkt is down  
> io-pkt stack had been hang ... two types of errors.
> First, the state of io-pkt-v4-hc is MUTEX. 
> the second is there is no list of io-pkt.

When io-pkt goes away, it probably has faulted.
Before you start io-pkt, run dumper to create a 
core dump of io-pkt in /tmp.

When io-pkt locks up on a mutex, do a pidin to
get the pid of io-net (say 12345) then do this:

  # dumper -p 12345

which will cause dumper to create a core file of
io-pkt, even though it didn't fault.

Now that you have a core dump, start io-pkt really
simply - without the wifi - just with a single ethernet
port, and ifconfig the port with an IP address, so
that you can ftp the core file produced by dumper
off the box, and send it to us.

--
aboyd
Re: RE: io-pkt is down  
Hi, i'm a same team member with Wayne.
i send two core dump files.

1. io-pkt stack down (pidin : there is no list of io-pkt) : io-pkt-v4-hc_stackdown.core
2. io-pkt mutex state : io-pkt-v4-hc_mutex.core

Thank you.
Attachment: Compressed file core.tar 8.35 MB
Re: RE: io-pkt is down  
On Wed, Oct 15, 2008 at 08:16:56AM -0400, Sunha Choi wrote:
> Hi, i'm a same team member with Wayne.
> i send two core dump files.
> 
> 1. io-pkt stack down (pidin : there is no list of io-pkt) : io-pkt-v4-hc_stackdown.core
> 2. io-pkt mutex state : io-pkt-v4-hc_mutex.core
> 
> Thank you.
> 

Can you also include the binaries in the archive (io-pkt
and any dlls loaded therein (devnp-ath.so, lsm-*)).

Thanks,

-seanb
Re: RE: io-pkt is down  
I send the binaries (io-pkt & dlls).

Thank you.
Attachment: Text dll.tar 2.31 MB
RE: RE: io-pkt is down  
Ok, got all the binaries.  Wrestling with
gdb right now.

--
aboyd
RE: RE: io-pkt is down  
Do you have a non-stripped io-pkt-v4-hc
that you can send me?

--
aboyd
Re: RE: RE: io-pkt is down  
Is io-pkt-v4-hc stripped?
I resend the files.
Thanks a lot~!

Attachment: Compressed file io-pkt_debug.zip 2.02 MB
RE: RE: RE: io-pkt is down  
The unstripped binaries work MUCH better with gdb,
but I should mention it is still complaining about
the following binaries missing:

  devn-mpc5200.so
  libdma-bestcomm5200.so.1

Now, when I look at the mutex-blocked core of
io-pkt (that you snapshotted with dumper -p 123)
I can see:

  Thread 1 is blocked in SignalWaitInfo (looks ok)

  Thread 2 rxd an IP packet, and is attempting to
  forward (transmit) the packet, but is mutex-blocked
  in shim_start() (after ether_output())

  Thread 3 faulted someplace mysterious (no symbols,
  just question marks), the SIGSEGV handler ran to
  try to quiece the hardware, and is mutex blocked.

It would appear that the fault occurred in the missing
binaries (above) and the mutex blocking is simply a
result of excessively complicated shutdown code in
the driver. 


Now, looking at the actual core dump of io-pkt, it
looks like io-pkt for some reason was already in
the process of orderly shutdown - ie it received
a message, then called dodie(), then doshutdownhooks()
then shim_shutdown(), then a few frames with no
symbols (likely 5200 binaries above) which appeared
to fault and end up in the sigsegv handler.

It sure looks like the 5200 driver has some bugs.  If
You want to send me the missing (unstripped!) binaries
above, I can probably gather a little bit more information,
but in this situation, what really really helps is
to compile the offending binary -g -O0 so that we
can narrow down the fault to an exact line of code
instead of just a function.

I am unfamiliar with these 5200 binaries - did you
compile your own versions, or did you get them from
QNX?

--
aboyd

Re: RE: RE: RE: io-pkt is down  
Hi Andrew.

I got Lite5200B BSP 1.0.4 from QNX and I compiled  unmodified devn-mpc5200 source code.
I also used prebuilt 5200 binaries, but the result is same.

Thank you.
Re: RE: RE: RE: io-pkt is down  
I upload two files (devn-mpc5200.so, libdma-bestcomm5200.so.1).
Thanks.
Attachment: Compressed file devn.zip 95.09 KB
RE: RE: RE: RE: io-pkt is down  
> upload two files (devn-mpc5200.so, libdma-bestcomm5200.so.1)

Thanks for that.  Sorry about the delayed response, I was
away last week, and sick this weekend.

With all the binaries, we can see that both problems
(mutex lock and stack fault) have the same root cause:

line 547 of io-net.c of the shim is attempting to deference
a null pointer (shim).  This is a result of either:

1) the npkt->flags being corrupted (eg multiple, conflicting
   definitions), or

2) the value of the shim pointer (which was squirrelled 
   away previously by the shim) being overwritten with zero, or

3) the ex_tx_done() function being passed a pointer to
   an npkt that it wasn't expecting (ie it did not create the npkt).

I'll see if I can pore through the core dump some more, to
try to narrow it down, but I thought I'd give you an update.

Executive summary: looks like some subtle incompatibility 
between the shim (which emulates io-net) and the mpc5200 
io-net driver.

--
aboyd
RE: RE: RE: RE: io-pkt is down  
And there is an option 4).  The io-net driver has a bug in it that
wasn't being hit in io-net for some reason (which we've seen several
times already).

	Robert.

-----Original Message-----
From: Andrew Boyd [mailto:community-noreply@qnx.com] 
Sent: Monday, October 27, 2008 10:18 AM
To: drivers-networking
Subject: RE: RE: RE: RE: io-pkt is down


> upload two files (devn-mpc5200.so, libdma-bestcomm5200.so.1)

Thanks for that.  Sorry about the delayed response, I was
away last week, and sick this weekend.

With all the binaries, we can see that both problems
(mutex lock and stack fault) have the same root cause:

line 547 of io-net.c of the shim is attempting to deference
a null pointer (shim).  This is a result of either:

1) the npkt->flags being corrupted (eg multiple, conflicting
   definitions), or

2) the value of the shim pointer (which was squirrelled 
   away previously by the shim) being overwritten with zero, or

3) the ex_tx_done() function being passed a pointer to
   an npkt that it wasn't expecting (ie it did not create the npkt).

I'll see if I can pore through the core dump some more, to
try to narrow it down, but I thought I'd give you an update.

Executive summary: looks like some subtle incompatibility 
between the shim (which emulates io-net) and the mpc5200 
io-net driver.

--
aboyd


_______________________________________________
Networking Drivers
http://community.qnx.com/sf/go/post15556
Re: RE: RE: RE: RE: io-pkt is down  
Suggestion: from the binaries, it looks like you are
compiling your own bsp devn-mpc5200.so ... on line 
41 of transmit.c change

  dpkt->flags = FEC_DEFRAG_PACKET;

to

  dpkt->flags |= FEC_DEFRAG_PACKET;

Instead of setting the bits, "or" them in, because
I think it is erroneously overwriting the _NPKT_UP
flag set by ex_alloc_up_pkt() in the shim.  

And when the shim looks for the _NPKT_UP bit set
in the npkt flags later, in ex_tx_done(), it isn't
seeing that bit set, and it's resetting the shim 
pointer to garbage (null) and faulting when it
tries to de-reference it.

It's this sort of driver bug that drives us nuts in
io-pkt - io-net may tolerate it, but the shim doesn't.