Project Home
Project Home
Wiki
Wiki
Discussion Forums
Discussions
Project Information
Project Info
Forum Topic - network driver timer lock up issue : (3 Items)
   
network driver timer lock up issue  
Hi,

We observed strange mutex blocking problem related to callout_* for our WiFi driver timer. We used our driver as AP mode
. and basically we have 3 types of timers:
1. control timer: for controlling Tx flow
2. data timer: for flush out Rx packet if timeout. One per each WiFi Station connected.
3. command timer: for command timeout

We have control timer on io-pkt thread, the other 2 types of timer on our work thread, created by nw_pthread_create(). 
Depending on the conditions, the 3 timers are frequently start/stop or created on the fly (for new WiFi client, e.g).

Our observation is that when we running stress test with one AP against 2 WiFi stations, we have mutex lock up issue 
when calling callout_msec(..) or callout_stop(..). Our log shows the line before we call the callout_** function, but 
never returns. We believe the mutex being blocked is from TCP/IP stack of QNX and we would like to know why this issue 
happens. The pidin shows:  

#pidin 
 1036304   1 /boot/io-pkt-v4-hc  21r SIGWAITINFO
 1036304   2 /boot/io-pkt-v4-hc  25r RUNNING
 1036304   3 /boot/io-pkt-v4-hc  21r RECEIVE     24
 1036304   4 /boot/io-pkt-v4-hc  25r MUTEX       (0x20b670) 1036304-02 #0

Thanks,

Yurong
Re: network driver timer lock up issue  
Is thread 2 always in state RUNNING?   Force a core with 'dumper -p <io-pkt pid>' and get a back trace of thread 2 with 
gdb.

Sent from my BlackBerry 10 smartphone.

From: Yurong Sun
Sent: Thursday, January 31, 2013 6:00 PM
To: drivers-networking
Reply To: drivers-networking@community.qnx.com
Cc: yurong@marvell.com
Subject: network driver timer lock up issue


Hi,

We observed strange mutex blocking problem related to callout_* for our WiFi driver timer. We used our driver as AP mode
. and basically we have 3 types of timers:
1. control timer: for controlling Tx flow
2. data timer: for flush out Rx packet if timeout. One per each WiFi Station connected.
3. command timer: for command timeout

We have control timer on io-pkt thread, the other 2 types of timer on our work thread, created by nw_pthread_create(). 
Depending on the conditions, the 3 timers are frequently start/stop or created on the fly (for new WiFi client, e.g).

Our observation is that when we running stress test with one AP against 2 WiFi stations, we have mutex lock up issue 
when calling callout_msec(..) or callout_stop(..). Our log shows the line before we call the callout_** function, but 
never returns. We believe the mutex being blocked is from TCP/IP stack of QNX and we would like to know why this issue 
happens. The pidin shows:

#pidin
1036304 1 /boot/io-pkt-v4-hc 21r SIGWAITINFO
1036304 2 /boot/io-pkt-v4-hc 25r RUNNING
1036304 3 /boot/io-pkt-v4-hc 21r RECEIVE 24
1036304 4 /boot/io-pkt-v4-hc 25r MUTEX (0x20b670) 1036304-02 #0

Thanks,

Yurong



_______________________________________________

Networking Drivers
http://community.qnx.com/sf/go/post98968
To cancel your subscription to this discussion, please e-mail drivers-networking-unsubscribe@community.qnx.com
Attachment: HTML sf-attachment-mime9493 3.33 KB
Re: network driver timer lock up issue  
Hi, Sean,

Yes. The thread 2 is always RUNNING and the CPU usage is 100% when this problem happens.

Following your instruction, we got backtrace in gdb as below: (showing both thread 2 and thread 4). Could you tell us 
what the culprit that we got into this situation?

[New pid 204821 tid 1]
[New pid 204821 tid 2]
[New pid 204821 tid 3]
[New pid 204821 tid 4]
#0  0x0103c67c in SignalWaitinfo ()
   from libc.so.3
(gdb) thread 2
[Switching to thread 2 (pid 204821 tid 2)]#0  0x0018f12c in softclock ()
(gdb) bt
#0  0x0018f12c in softclock ()
#1  0x0018aeac in hardclock ()
#2  0x001ab6b8 in receive_loop_multi ()
#3  0x0019f9a8 in thread_init ()
#4  0x0101f9f0 in timer_settime () from libc.so.3
Backtrace stopped: frame did not save the PC
(gdb) thread 4
[Switching to thread 4 (pid 204821 tid 4)]#0  0x0103c778 in SyncMutexLock_r ()
   from libc.so.3
(gdb) bt
#0  0x0103c778 in SyncMutexLock_r () from libc.so.3
#1  0x0019f8c8 in exclusion_lock_mp ()
#2  0x0018eb10 in callout_reset ()
#3  0x0018ec30 in callout_msec ()
#4  0x7801ecdc in woal_ioctl_get_bss_resp () from devnp-mrvl_wlan-sdiorm.so
Backtrace stopped: frame did not save the PC

Thanks,

Yurong