Project Home
Project Home
Wiki
Wiki
Discussion Forums
Discussions
Project Information
Project Info
Forum Topic - Squelch Test errors: (22 Items)
   
Squelch Test errors  
What can learn from a system that shows 5000 Squelch Test errors (6.4.1 e1000) out of 500millions packets. I've never 
seen that error before.  It's present on 2 PC out of 6 that are connected to the same switch.  Cable length is in the 20
 feet range.
RE: Squelch Test errors  
I think I mentioned this last week ...

In an attempt to better diagnose the source of
lost packets, I overloaded the now-unused SQE
counter.

See, you can either lose packets because you
run out of descriptors (the nic was able to
buffer the packet, but there was no cpu ram
available) OR the nic was unable to buffer the
packet because it overran it's internal rx fifo.

Previous drivers added the two together, which
annoyed me.

So, I reused SQE for internal rx fifo overruns, 
which generally indicates excessive bus latency, 
or perhaps misconfigured link-level flow control,
or even misconfigured rx fifo watermarks.

I will now walk down to the parking lot and break
off a car antenna and whip myself in exculpation :-)

We should probably document this somewhere, to
reduce the number of lacerations on my back ...

--
aboyd  www.PoweredByQNX.com/images/L39x2.jpg
RE: Squelch Test errors  

> -----Original Message-----
> From: Andrew Boyd [mailto:community-noreply@qnx.com]
> Sent: Monday, September 14, 2009 9:27 AM
> To: drivers-networking
> Subject: RE: Squelch Test errors
> 
> 
> I think I mentioned this last week ...
> 
> In an attempt to better diagnose the source of
> lost packets, I overloaded the now-unused SQE
> counter.
> 
> See, you can either lose packets because you
> run out of descriptors (the nic was able to
> buffer the packet, but there was no cpu ram
> available) OR the nic was unable to buffer the
> packet because it overran it's internal rx fifo.
> 
> Previous drivers added the two together, which
> annoyed me.
> 
> So, I reused SQE for internal rx fifo overruns,
> which generally indicates excessive bus latency,
> or perhaps misconfigured link-level flow control,
> or even misconfigured rx fifo watermarks.

That sounds bad.  There was 5000 Squelch Test errors, and around 4000 "Packets Dropped on received".  I through I added 
some tx and rx descriptor but looks like this one slipped through.  Just that,  I'm hoping this will get rid of the rx 
interface fifo overrun...

> 
> I will now walk down to the parking lot and break
> off a car antenna and whip myself in exculpation :-)

There are people at QNX who have cars that are that old?

> 
> We should probably document this somewhere, to
> reduce the number of lacerations on my back ...
> 
> --
> aboyd  www.PoweredByQNX.com/images/L39x2.jpg
> 
> 
> 
> 
> _______________________________________________
> 
> Networking Drivers
> http://community.qnx.com/sf/go/post37884
> 
RE: Squelch Test errors  
> There was 5000 Squelch Test errors, and around 
> 4000 "Packets Dropped on received". 

So, you had 5000 rx packets lost because of rx fifo 
overruns, and 4000 rx packets lost because the rx 
descriptor ring was full.

Increase the size of your "receive=X" command line
option to the driver to get rid of the rx descriptor
packet losses.

You lost (rougly) 10,000 packets out of 500 million,
which is one lost rx packet every 50,000, which isn't
bad, but isn't perfect - very occasionally, you will
see a hiccup as the network protocol times out and
re-transmits.

>> I will now walk down to the parking lot and break
>> off a car antenna and whip myself in exculpation :-)
>
> There are people at QNX who have cars that are that old?

Actually, I just broke 500,000 km on my 2001 Honda
Civic.  I want to video the odometer rolling over
ONE MEELLION kilometers for youtube  :-)

I'm wondering - will the odometer just go blank?
Will it go back to zero?  Will it display an error?
Perhaps exponential notation?

--
aboyd
RE: Squelch Test errors  

> -----Original Message-----
> From: Andrew Boyd [mailto:community-noreply@qnx.com]
> Sent: Monday, September 14, 2009 9:54 AM
> To: drivers-networking
> Subject: RE: Squelch Test errors
> 
> 
> > There was 5000 Squelch Test errors, and around
> > 4000 "Packets Dropped on received".
> 
> So, you had 5000 rx packets lost because of rx fifo
> overruns, and 4000 rx packets lost because the rx
> descriptor ring was full.
> 
> Increase the size of your "receive=X" command line
> option to the driver to get rid of the rx descriptor
> packet losses.
> 

Hum it's to 4096 and just got 191 Packets Dropped on received
And 78 squelch Test errors, something isn't right.  Out of 10Million packet.

Another machine has 703 packet dropped and 283 sequel error out of also 11 million packet.

Will keep on digging.  Thanks 
RE: Squelch Test errors  
>> Increase the size of your "receive=X" command line
> 
> it's to 4096 

Hm.  Just checked the source and both devnp-e1000 
and devnp-i82544 limit rx to 2048.  I wonder if
there is some underlying hardware limitation? 

> 191 Packets Dropped on received
> and 78 squelch Test errors
> Out of 10Million packet.

that's around 1 in 90,000 rx packet dropped,
which again isn't perfect, but isn't really 
that bad, either.

> 703 packet dropped and 283 sequel error 
> out of 11 million packet.

1 in 11,000 - not bad, but not great either.

With a big enough rx descriptor ring, you
ought to be able to straddle your scheduling
latency of io-pkt and entirely get rid of the 
"dropped" errors, unlike you've got something 
that really hogs the cpu at a very high priority 
for a lengthy time.

The SQE (rx fifo) errors may be a bit trickier
to eradicate - first, you have to figure out
the exact cause.

--
aboyd
RE: Squelch Test errors  
You can also check to see if any other devices are on the same interrupt
line and slay their drivers to see if the situation improves.


-----Original Message-----
From: Andrew Boyd [mailto:community-noreply@qnx.com] 
Sent: Monday, September 14, 2009 10:49 AM
To: drivers-networking
Subject: RE: Squelch Test errors


>> Increase the size of your "receive=X" command line
> 
> it's to 4096 

Hm.  Just checked the source and both devnp-e1000 
and devnp-i82544 limit rx to 2048.  I wonder if
there is some underlying hardware limitation? 

> 191 Packets Dropped on received
> and 78 squelch Test errors
> Out of 10Million packet.

that's around 1 in 90,000 rx packet dropped,
which again isn't perfect, but isn't really 
that bad, either.

> 703 packet dropped and 283 sequel error 
> out of 11 million packet.

1 in 11,000 - not bad, but not great either.

With a big enough rx descriptor ring, you
ought to be able to straddle your scheduling
latency of io-pkt and entirely get rid of the 
"dropped" errors, unlike you've got something 
that really hogs the cpu at a very high priority 
for a lengthy time.

The SQE (rx fifo) errors may be a bit trickier
to eradicate - first, you have to figure out
the exact cause.

--
aboyd




_______________________________________________

Networking Drivers
http://community.qnx.com/sf/go/post37903
RE: Squelch Test errors  
> if any other devices are on the same interrupt

Right!  Sharing interrupts is definitely to be avoided
if at all possible:

http://community.qnx.com/sf/wiki/do/viewPage/projects.networking/wiki/Dr
ivers_wiki_page

and scroll down to:

-- cut --

Shared Interrupts - Problems?

Different devices sharing a hardware interrupt is kind of a neat idea,
but unless you really need to do it - because you've run out of hardware
interrupt lines - it generally doesn't help you much. In fact, it can
cause you trouble. For example, if your driver doesn't work - e.g. no
received packets - check and see if it is sharing an interrupt with
another device, and if so, re-configure your board so it does not. Most
of the times, when shared interrupts are configured, there is no good
reason for it - i.e. you haven't really run out of interrupts - and this
can decrease your performance, because when the interrupt fires, ALL of
the devices sharing the interrupt need to run and check and see if it is
for them. If you check the source, you can see that some drivers do the
"right thing" which is to read registers in their interrupt handlers to
see if the interrupt is really for them, and ignore it if not. But many,
many drivers do not - they schedule their thread-level event handlers to
check their hardware, which is inefficient and reduces performance. If
you are using PCI bus, use the "pci -v" utility to check interrupt
allocation. You may be surprised as to what you see. Another point worth
making is that sharing interrupts can vastly increase interrupt latency,
depending upon exactly what each of the drivers do. Remember that after
an interrupt fires, it will NOT be re-enabled by the kernel until ALL
driver handlers tell the kernel that they have completed handling. So,
if one driver takes a long, long time servicing a shared interrupt which
is masked, if another device on the same interrupt causes an interrupt
during that time period, processing of that interrupt can be delayed for
an unknown duration of time. Bottom line is that interrupt sharing can
cause problems, and reduce performance, increase cpu consumption, and
seriously increase latency. Unless you really need to do it, don't. If
you must share interrupts, make sure your drivers are doing the "right
thing". Face it - shared interrupts are the new "trans fats" :-) 

-- cut --

P.S.  Took at peek at the intel doc, it mentions an
internal limit of 64k for rx descr ring size.  Looked
at the BSD (wm) driver, it's hardcoded to 256!

I suspect we should bump up the rx max descr ring from
2048 to 4096 in both devnp-e1000 and devnp-i82544.

--
aboyd
RE: Squelch Test errors  
Thanks for the suggestion Hugh.  Interrupt is not shared and priority of the event thread is set to 100.  Just checked 
with pidin and none of io-pkt thread's priority are above 21, that's odd?  

Io-pkt-v4 -de1000 transmit=4096 receive=4096 priority=100 -pqnet no_slog=1
None of our process are above priority 20.  Machine is Quad Core running at 2.3G.


> -----Original Message-----
> From: Hugh Brown [mailto:community-noreply@qnx.com]
> Sent: Monday, September 14, 2009 10:54 AM
> To: drivers-networking
> Subject: RE: Squelch Test errors
> 
> You can also check to see if any other devices are on the same
> interrupt
> line and slay their drivers to see if the situation improves.
> 
> 
> -----Original Message-----
> From: Andrew Boyd [mailto:community-noreply@qnx.com]
> Sent: Monday, September 14, 2009 10:49 AM
> To: drivers-networking
> Subject: RE: Squelch Test errors
> 
> 
> >> Increase the size of your "receive=X" command line
> >
> > it's to 4096
> 
> Hm.  Just checked the source and both devnp-e1000
> and devnp-i82544 limit rx to 2048.  I wonder if
> there is some underlying hardware limitation?
> 
> > 191 Packets Dropped on received
> > and 78 squelch Test errors
> > Out of 10Million packet.
> 
> that's around 1 in 90,000 rx packet dropped,
> which again isn't perfect, but isn't really
> that bad, either.
> 
> > 703 packet dropped and 283 sequel error
> > out of 11 million packet.
> 
> 1 in 11,000 - not bad, but not great either.
> 
> With a big enough rx descriptor ring, you
> ought to be able to straddle your scheduling
> latency of io-pkt and entirely get rid of the
> "dropped" errors, unlike you've got something
> that really hogs the cpu at a very high priority
> for a lengthy time.
> 
> The SQE (rx fifo) errors may be a bit trickier
> to eradicate - first, you have to figure out
> the exact cause.
> 
> --
> aboyd
> 
> 
> 
> 
> _______________________________________________
> 
> Networking Drivers
> http://community.qnx.com/sf/go/post37903
> 
> 
> 
> 
> _______________________________________________
> 
> Networking Drivers
> http://community.qnx.com/sf/go/post37904
> 
RE: Squelch Test errors  
io-pkt can be a bit slippery as to which thread
(and what priority) you end up executing, depending
upon receive and transmit paths, multi-core, 
client calls, traffic on other ports, etc.  Your 
"priority=100" is supposed to be passed to the 
interrupt_entry_init() call, but ...

I might suggest that you run the following:

  # top -p 255

then do a big ttcp test.  What priority are the
io-pkt threads really running at?

P.S.  PR71200 created to track the request to bump
the max rx descr size from 2048 to 4096 for both
devnp-e1000 and devnp-i82544.

IMHO 4096/4096 should be the default max values 
for tx/rx for all gige drivers, hardware permitting.

--
aboyd
RE: Squelch Test errors  

> -----Original Message-----
> From: Andrew Boyd [mailto:community-noreply@qnx.com]
> Sent: Monday, September 14, 2009 1:29 PM
> To: drivers-networking
> Subject: RE: Squelch Test errors
> 
> 
> io-pkt can be a bit slippery as to which thread
> (and what priority) you end up executing, depending
> upon receive and transmit paths, multi-core,
> client calls, traffic on other ports, etc.  Your
> "priority=100" is supposed to be passed to the
> interrupt_entry_init() call, but ...
> 
> I might suggest that you run the following:
> 
>   # top -p 255
> 
> then do a big ttcp test.  What priority are the
> io-pkt threads really running at?

Can't do it for now (this problem is at a customer site).

I use the System Profiler and from what I can see when thread 2 of io-pkt-v4 receive the pulse send by the ISR, and the 
priority of the thread stays at 21.

> 
> P.S.  PR71200 created to track the request to bump
> the max rx descr size from 2048 to 4096 for both
> devnp-e1000 and devnp-i82544.
> 
> IMHO 4096/4096 should be the default max values
> for tx/rx for all gige drivers, hardware permitting.
> 
> --
> aboyd
> 
> 
> 
> 
> _______________________________________________
> 
> Networking Drivers
> http://community.qnx.com/sf/go/post37924
> 
RE: Squelch Test errors  
> when thread 2 of io-pkt-v4 receive the pulse 
> send by the ISR, and the priority of the thread 
> stays at 21.

Ah ha.  Threading (and thus priority) in io-pkt is 
outside the scope of the driver, unlike in io-net, 
where the driver manually created the rx thread.

This can get pretty complicated.  As far as the
threading priority goes for any given configuration
of X processors and Y interfaces, I think you
probably need to talk to Seanb.

--
aboyd
RE: Squelch Test errors  
Sean care to comment?

> -----Original Message-----
> From: Andrew Boyd [mailto:community-noreply@qnx.com]
> Sent: Monday, September 14, 2009 3:02 PM
> To: drivers-networking
> Subject: RE: Squelch Test errors
> 
> 
> > when thread 2 of io-pkt-v4 receive the pulse
> > send by the ISR, and the priority of the thread
> > stays at 21.
> 
> Ah ha.  Threading (and thus priority) in io-pkt is
> outside the scope of the driver, unlike in io-net,
> where the driver manually created the rx thread.

Then why the priority argument support by the driver?  Another job for Super Steve ? ( said with a super hero sound)

> 
> This can get pretty complicated.  As far as the
> threading priority goes for any given configuration
> of X processors and Y interfaces, I think you
> probably need to talk to Seanb.

Sean?

> 
> --
> aboyd
> 
> 
> 
> 
> _______________________________________________
> 
> Networking Drivers
> http://community.qnx.com/sf/go/post37940
> 
RE: Squelch Test errors  
> why the priority argument support by the driver? 

Actually the driver does the "right thing" in that
it sets the default priority (IRUPT_PRIO_DEFAULT)
and allows the user command line option to override
it, and then passes it down to interrupt_entry_init()
which is all as it should be.

--
aboyd
RE: Squelch Test errors  

> -----Original Message-----
> From: Andrew Boyd [mailto:community-noreply@qnx.com]
> Sent: Monday, September 14, 2009 3:21 PM
> To: drivers-networking
> Subject: RE: Squelch Test errors
> 
> 
> > why the priority argument support by the driver?
> 

> it sets the default priority (IRUPT_PRIO_DEFAULT)
> and allows the user command line option to override
> it, and then passes it down to interrupt_entry_init()
> which is all as it should be.

The doc says: "The priority of the driver's event-handler thread (default 21)."

> 
> --
> aboyd
> 
> 
> 
> 
> _______________________________________________
> 
> Networking Drivers
> http://community.qnx.com/sf/go/post37944
> 
RE: Squelch Test errors  
>>> why the priority argument support by the driver?
>>
>> it sets the default priority (IRUPT_PRIO_DEFAULT)
>> and allows the user command line option to override
>> it, and then passes it down to interrupt_entry_init()
>> which is all as it should be.
>
>The doc says: "The priority of the driver's event-handler 
>thread (default 21)."

Important Note:  In io-net, the driver created the rx thread.
In io-pkt, the driver does NOT create the rx thread - instead,
it passes the priority to interrupt_entry_init() as specified.

AFAIK there is nothing else that the driver is capable of
doing with respect to the priority in the io-pkt infrastructure.

--
aboyd
RE: Squelch Test errors  
I see that io-pkt-vX has the following "-p tcpip" options:

  rx_prio
  rx_pulse_prio

You probably should look at these.

--
aboyd
 
RE: Squelch Test errors  
Thanks will give that a try.

> -----Original Message-----
> From: Andrew Boyd [mailto:community-noreply@qnx.com]
> Sent: Monday, September 14, 2009 3:05 PM
> To: drivers-networking
> Subject: RE: Squelch Test errors
> 
> 
> I see that io-pkt-vX has the following "-p tcpip" options:
> 
>   rx_prio
>   rx_pulse_prio
> 
> You probably should look at these.
> 
> --
> aboyd
> 
> 
> 
> 
> _______________________________________________
> 
> Networking Drivers
> http://community.qnx.com/sf/go/post37941
> 
Re: RE: Squelch Test errors  
> Thanks for the suggestion Hugh.  Interrupt is not shared and priority of the 
> event thread is set to 100.  Just checked with pidin and none of io-pkt 
> thread's priority are above 21, that's odd?  
> 
> Io-pkt-v4 -de1000 transmit=4096 receive=4096 priority=100 -pqnet no_slog=1
> None of our process are above priority 20.  Machine is Quad Core running at 2.
> 3G.

Just so we are all clear.

I see spaces between the options, those are suppose to be comma connected, right? Ie, it suppose to be:

io-pkt-v4 -de1000 transmit=4096,receive=4096,priority=100 -pqnet no_slog=1

-xtang
RE: RE: Squelch Test errors  
> Just so we are all clear.
> 
> I see spaces between the options, those are suppose to be comma
> connected, right? Ie, it suppose to be:
> 
> io-pkt-v4 -de1000 transmit=4096,receive=4096,priority=100 -pqnet
> no_slog=1

Aaarrrrrrrrrrrrrgggghhhhhhh.

> 
> -xtang
> 
> 
> 
> 
> _______________________________________________
> 
> Networking Drivers
> http://community.qnx.com/sf/go/post37947
> 
RE: Squelch Test errors  
PR 71191 created and assigned to the tireless Steve
Reid to document this in the devnp-e1000 and devnp-i82544
driver notes.

--
aboyd
RE: Squelch Test errors  
May I suggest that the string (Squehch Test errors) in nicinfo be changed ?

> -----Original Message-----
> From: Andrew Boyd [mailto:community-noreply@qnx.com]
> Sent: Monday, September 14, 2009 9:35 AM
> To: drivers-networking
> Subject: RE: Squelch Test errors
> 
> PR 71191 created and assigned to the tireless Steve
> Reid to document this in the devnp-e1000 and devnp-i82544
> driver notes.
> 
> --
> aboyd
> 
> 
> 
> 
> _______________________________________________
> 
> Networking Drivers
> http://community.qnx.com/sf/go/post37888
>