Project Home
Project Home
Documents
Documents
Wiki
Wiki
Discussion Forums
Discussions
Project Information
Project Info
BroadcastCommunity.qnx.com will be offline from May 31 6:00pm until June 2 12:00AM for upcoming system upgrades. For more information please go to https://community.qnx.com/sf/discussion/do/listPosts/projects.bazaar/discussion.bazaar.topc28418
Forum Topic - Interrupt Latency: (16 Items)
   
Interrupt Latency  
Hi everyone,

I'm trying to setup a system where I can synchronise a 100kHz (10us) process through PCI on an interrupt base. There is 
a huge problem with the kernel at this frequency of interrupting. The thing is, when giving my ISR (with noting more 
than an out32()) a priority higher than every other system process, it stil takes about 3 us for the interrupt to be 
handeled. Extually this is not the main problem.

When interrupting at this frequency the systemload tends to be enormous. With kernel event trace (momentics) I get a 
systemload of 100% and 80-90% is used bij system. Resulting in the fact that the proccessor can't do anything else 
inbetween.

How is it possible that the ISR takes up so much of system resources, even when the ISR itself doesn't do that much? It 
isnt't that big of a problem that the interrupt latency itself is x us but that the system is busy 100% means that I 
can't do anything else!
Re: Interrupt Latency  
Another question..

is it possible to synchronize to a 100kHz process in another way than using interrupts? For example poll a byte in RAM 
memory once every 1 us? And set that byte by DMA every 10 us. (It's not a problem if once a while one of the sync's 
fail).
Re: Interrupt Latency  
> Another question..
> 
> is it possible to synchronize to a 100kHz process in another way than using 
> interrupts? For example poll a byte in RAM memory once every 1 us? And set 
> that byte by DMA every 10 us. (It's not a problem if once a while one of the 
> sync's fail).


You did not mention what your hardware is but unless so go some very powerful hardware 100 000 interrupts a second is 
going to put a huge strain on the machine.

Yes you could poll on the byte that is being modified by DMA.  However you cannot do it at high priority otherwise you 
will starve all the other processes.  You did not specify what "once in a while is" but if the program is run a normal 
priority you will likely miss a lost of them in bust.  

One way to solve this would be to use a dual core and set your program to run on core 2 at highest priority and pool the
 hardware.  But even that is ugly.

However I'm curious as you what your "synchronization" exactly does because these isn't much that you can do every 10us.
..

You probably need to rethink your architecture.




Re: Interrupt Latency  
Hello Mario,

The platform is a 1Ghz system. The interrupt is triggered by a PC104+ device.

The application used to test the latency is a simple interrupt attach. In the ISR, a bit is set on the PC104+ adapter. 
Using a scope, we can determine the time difference between setting the interrupt and getting an answer back.

The test program is a 'clean' QNX example on how to use InterruptAttach()

It is noticed that the process that 'owns' the ISR is using 100% cpu.

@Hugo: As said before, make sure you are the only one using the IRQ. Kill all processes that might use the IRQ. Most 
likely, USB and NIC might be users of this interrupt. Kill em all before running the program. The io-usb or io-net might
 cause a delay.

@All, what about interrupt priority, see:
http://www.openqnx.com/PNphpBB2-viewtopic-t10081-.html
http://www.openqnx.com/index.php?name=PNphpBB2&file=printview&t=9837&start=15

@All: Could this be a part of a solution to overcome the 100% cpu usage?

@All: What about a faster CPU? Does this really help?

Cheers,
Freddy


Regards,
Freddy
Re: Interrupt Latency  
> Hello Mario,
> 
> The platform is a 1Ghz system. The interrupt is triggered by a PC104+ device.
> 
> The application used to test the latency is a simple interrupt attach. In the 
> ISR, a bit is set on the PC104+ adapter. Using a scope, we can determine the 
> time difference between setting the interrupt and getting an answer back.
> 
> The test program is a 'clean' QNX example on how to use InterruptAttach()
> 
> It is noticed that the process that 'owns' the ISR is using 100% cpu.

Hum I would not expect 100% cpu usage.  Are you sure the ISR is clearing the source of the interrupt properly?  Can you 
post the code of your interrupt?

> 
> @Hugo: As said before, make sure you are the only one using the IRQ. Kill all 
> processes that might use the IRQ. Most likely, USB and NIC might be users of 
> this interrupt. Kill em all before running the program. The io-usb or io-net 
> might cause a delay.
> 
> @All, what about interrupt priority, see:
> http://www.openqnx.com/PNphpBB2-viewtopic-t10081-.html
> http://www.openqnx.com/index.php?name=PNphpBB2&file=printview&t=9837&start=15
> 
> @All: Could this be a part of a solution to overcome the 100% cpu usage?
> 
> @All: What about a faster CPU? Does this really help?


> 
> Cheers,
> Freddy
> 
> 
> Regards,
> Freddy


Re: Interrupt Latency  
Goodmorning,

thanks for all your reply's!

It's understandable that my first post wasn't very specific about how and why the system has to be interrupted every 
10us. I'll try to explane some more about that. However first, mentioned earlier by Freddie the process runs at a 1GHz 
CPU and interrupt is triggered by PC104+ PCI. I use interruptattach() and kill ALL processes using (in my case) IRQ 0x0B
. This saves up a lot because USB was attached to the same interrupt. After I made sure no other devices were attached 
to the 0x0B, the first thing I threw overboard was the check if the interrupt was really mine. This saved up some more 
(not very much because PCI access was about +/- 0.4 us). The only thing I now do within the ISR is clearing the source 
directly! This is just one out32() statement. Whitin the original process is a interrupt_wait(). I'll post the code in a
 few minutes.

Second. The interrupt is properly cleared about .2us after detection with ISR. The estimated time of 3us was measured 
using scope (source and clear whitin FPGA and plx bridge) but not on pci itself. Although a bridge chip is between the 
two, I severly suspect that the bridge doesn't add any (meaningfull) latency. I'll test this later today. I also 
see(examining .kev traces) that the ISR itself takes about 1.13us. This means there's about +/- 2us before the ISR is 
called!

In my second post I mentioned another way to poll RAM. It isn't a matter of life-or-death that the process skips a 10us 
frame let's say one in every 10 frames (100us). Whitin the 10us a calculation has to be completed that runs approximatly
 1us. This is why a huger CPU load is a problem, not latency. The idea was to run the calculation and than poll the RAM 
every +/- 2us. This way maybe it's possible to synchronize the processes. I haven't tried it yet, but I will. The 
calculation has to be done every 10us because it controls mirrors that deflect a laser. The controller of the mirror has
 to be fed an SPI value every 10us. If one frame is missed, the previous value can be send because of mass-latency (the 
mirror isn't moved much in 10us, for instance a full sweep of the radius takes about 250us at full speed, and this is 
never done!).

I forgot to mention, polling the RAM every 2us takes about 0CPU time because the CPU is idle during 2us wait statment.

I see that one question remains unanswered. I explicitly can't divide the interrupts softwaresided. This is because I 
need 100kHz sychronisity. If this was the question I could devide the interrupts hw-sided (FPGA). I already saw that 
losing up the specs a little and synchronize at 50 or even 20 kHz does the tric, however this is a desperate measure. I 
already thought of using duo-core and eventually the process will even run at 1.5GHZ or even higher/more cores.  

I hope this answers most of your questions. I know there's a lot of discussion about interrupt latency. I never came 
across someing like this using a softcore microprocessor where interruptlatency was about 10ns, so 2us seems very large,
 bus is defined normal in x86 architectures. So I'm looking for some kind of workaround instead. The main problem is 
still CPU-load.

Regards,

Hugo Zijlmans

code: attached (sorry but comments on my part are in Dutch, but I think they arn't nescessary because code is clean 
example of interrupthandling)
Attachment: Text interrupt.cc 1.54 KB
Re: Interrupt Latency  
> Goodmorning,
> 
> thanks for all your reply's!
> 

At the risk of being called paranoid, you should check the return value of InterruptWait because in case of error it 
will consume 100% of the CPU.  I don't see anything in the code that would cause it to fail though.Also setprio(0, 22) 
will do the job, no need for calling getgpid.

If the calculation code doesn't contain any floating point, you could do the calculation in the ISR and not return any 
event.  It's not ideal but it could get the job done. It would reduce latency and leave the rest of what little is left 
of the CPU available to anything else. I still think the polling with a dual-core is nicer though.

Could you post somewhere (probably too big to fit here ) the .kev file showing the behavior.  Isn't it strange that 
tracelogger was able to do i's thing, given there suppose to be so little CPU left.
Re: Interrupt Latency  
On Tue, Mar 25, 2008 at 3:21 PM, Mario Charest <mcharest@zinformatic.com>
wrote:

> > Goodmorning,
> >
> > thanks for all your reply's!
> >
>
> At the risk of being called paranoid, you should check the return value of
> InterruptWait because in case of error it will consume 100% of the CPU.  I
> don't see anything in the code that would cause it to fail though.Alsosetprio(0, 22) will do the job, no need for 
calling getgpid.
>
> If the calculation code doesn't contain any floating point, you could do
> the calculation in the ISR and not return any event.  It's not ideal but it
> could get the job done. It would reduce latency and leave the rest of what
> little is left of the CPU available to anything else. I still think the
> polling with a dual-core is nicer though.
>
> Could you post somewhere (probably too big to fit here ) the .kev file
> showing the behavior.  Isn't it strange that tracelogger was able to do i's
> thing, given there suppose to be so little CPU left.


Not if 100% of the CPU being consumed includes idle =;-)

Thomas ... interested in seeing the log.
Re: Interrupt Latency  
> Not if 100% of the CPU being consumed includes idle =;-)

It was mentioned earlier that it was the process owning the ISR that was using 100% of the CPU.

> 
> Thomas ... interested in seeing the log.

Something doesn't quite add up ;)


Re: Interrupt Latency  
> > Goodmorning,
> > 
> > thanks for all your reply's!
> > 
> 
> At the risk of being called paranoid, you should check the return value of 
> InterruptWait because in case of error it will consume 100% of the CPU.  I 
> don't see anything in the code that would cause it to fail though.Also setprio
> (0, 22) will do the job, no need for calling getgpid.
> 
> If the calculation code doesn't contain any floating point, you could do the 
> calculation in the ISR and not return any event.  It's not ideal but it could 
> get the job done. It would reduce latency and leave the rest of what little is
>  left of the CPU available to anything else. I still think the polling with a 
> dual-core is nicer though.
> 
> Could you post somewhere (probably too big to fit here ) the .kev file showing
>  the behavior.  Isn't it strange that tracelogger was able to do i's thing, 
> given there suppose to be so little CPU left.

@mario:

Thanks for your comments. But... however....
Value of Interruptwait is OK. There are no errors and I know the interrupts are handled at 100kHZ because after a while 
(10000000 interrupts or so), I programmed the thread to disable the interrupts on the device. Qconn restores and 0% cpu 
is used once again.

getgpid is used for letting me know at which proirity I currently run, offcource I don't need it, but it's for my peace 
of mind ;-)

The calculation I speak of are all floatingpoint and geometric calculations, so I can;t put it in ISR :-)

In a while (week or so), i'm going to try the ram polling mechanism, I'm currently building the FPGA (higher priority), 
naturally i'll post the results :-D

> > Not if 100% of the CPU being consumed includes idle =;-)
> 
> It was mentioned earlier that it was the process owning the ISR that was using
>  100% of the CPU.
> 
> > 
> > Thomas ... interested in seeing the log.
> 
> Something doesn't quite add up ;)
> 
> 

I was able to make .kev traces at 10kHz ;), your point is correct. I can't make .kev at 100kHz because cpu is 100% busy 
with system and user processes. I think it's of no use attaching the .kev because it's too large and ISR isn't 
debuggable, it only shows enter and exitpoints, but I'll look into it anyway (I don't where to find the file (I'm not 
fermiliar with linux systems, I know, shame on me)).

regards,

Hugo
Re: Interrupt Latency  
cpu usage at 10khz, 20khz, 40khz and 80khz interrupts
Attachment: Image 10_20_40_80khz.GIF 7.96 KB
Re: Interrupt Latency  
Could you post a pic of the timeline view showing the interrupt triggering and your handling running?

The 10khz view should be fine.

Colin

Hugo Zijlmans wrote:
> cpu usage at 10khz, 20khz, 40khz and 80khz interrupts
> 
> _______________________________________________
> OSTech
> http://community.qnx.com/sf/go/post6145
> 
> 
> ------------------------------------------------------------------------
> 

-- 
cburgess@qnx.com
Re: Interrupt Latency  
> > > thanks for all your reply's!

This is interesting stuff ;-)  That's a change of the standard "how can I make my video card work"

> 
> getgpid is used for letting me know at which proirity I currently run, 
> offcource I don't need it, but it's for my peace of mind ;-)

getprio(0); would do that

> 
> The calculation I speak of are all floatingpoint and geometric calculations, 
> so I can;t put it in ISR :-)

That's no entirely true, the problem is that the kernel doesn't save the floating point context (to save time) (I'm not 
%100 sure that's true of QNX6). If your ISR is the ONLY thing doing FP then you are ok.  Unfortunately it can be 
difficult to figure out if other process are using it. There is a way out though, you can add code to manually save and 
restore the FP context, tricky but doable.  Or course this still doesn't make it easy because if you have a floating 
point exception the machine will go bye bye ;-)  But sometimes desperate situation call for desperate mesure, although I
 don't think it's time to call your case a desperate ;-)

> 
> In a while (week or so), i'm going to try the ram polling mechanism, I'm 
> currently building the FPGA (higher priority), naturally i'll post the results
>  :-D

If you do the FPGA and I'm assuming the DMA part, may I suggest that you also put a counter of some sort that increments
 for every DMA operation as part of the data.  That way your program can at least know how many data it missed.  There 
is also the possibility of queuing the data if the program can make use of the data it couldn't deal with in a timely 
fashion.
Re: Interrupt Latency  
> This is interesting stuff ;-)  That's a change of the standard "how can I make
>  my video card work"

easy... just write your own driver ;-) ... Not that easy :-)...

> That's no entirely true, the problem is that the kernel doesn't save the 
> floating point context (to save time) (I'm not %100 sure that's true of QNX6).
>  If your ISR is the ONLY thing doing FP then you are ok.  Unfortunately it can
>  be difficult to figure out if other process are using it. There is a way out 
> though, you can add code to manually save and restore the FP context, tricky 
> but doable.  Or course this still doesn't make it easy because if you have a 
> floating point exception the machine will go bye bye ;-)  But sometimes 
> desperate situation call for desperate mesure, although I don't think it's 
> time to call your case a desperate ;-)

You say that fp could be done in ISR, I'm suprized if all my effords on that part, pay for themself by eliminating (for 
instance InterruptWait()). This should indeed save some time, but... I have to sync some other small stuff at 100khz as 
well.. so I'm not going to start with that. RAM polling is main case at this moment, I'm not that desperate yet, indeed!


> If you do the FPGA and I'm assuming the DMA part, may I suggest that you also 
> put a counter of some sort that increments for every DMA operation as part of 
> the data.  That way your program can at least know how many data it missed.  

This is actually a great idea. Thank for that, my vision was a bit limited, I would try to sync on some sort of bit, 
extually a counter gives you much more information! I was planning to use DMA, because it saves a lot op CPU time. I'm 
planning on just locking a page in sysmem and use this as a ping-pong buffer. At this point RAM isn't a real issue 
because we just use small amounts. Lateron it might even be nice to save some sort of history, but at let's say 100kb/s 
and running 24/7, this is too much data to handle. Than again HD is what, 10cents/Gb :-) (I calculated it actually to be
 3200GByte a year 

> There is also the possibility of queuing the data if the program can make use 
> of the data it couldn't deal with in a timely fashion.

Ah.... just my thought. Furthermore it's even possible to pipeline more stages. In this case the calculation doesn't 
have to be done within same 10us timeframe. You can plan out let's say 4 calculations within 40us (start one every 10us)
, and the result may be valid as late as 40us later. As I mentioned before... latency isn't a problem at all. Even when 
I miss of skip data, it shouldn't be a death or live problem. But if implementation leads to 50% misses, you could say 
it's better to screw down the specs, and run at 20us frames! This extually saves more than 50% resources!

Thanks again for sharing your insights!

Hugo Zijlmans
Re: Interrupt Latency  
.kev trace 10kHz (100us) interrupts
Attachment: Text Testserver-trace-080331-123946.kev 6.86 MB
AW: Interrupt Latency  
Hi Hugo,

the 3us you say 'it takes for the interrupt to be handeled' -- 
how did you measure them? In hardware (taking time between 
IRQ and out()), or in software (e.g., looking at kev traces)?

Also you didn't quite say if you are using a "real" interrupt 
handler (attached with InterruptAttach()) or are actually doing 
the work in a thread (InterruptAttachEvent()).

In the latter case, you'd obviously cause very frequent
scheduling, which would involve much more kernel code than the 
mere invokation of an interrupt handler does.

Here's a few things to look out for:
* Make sure (as good as possible) that you are not sharing this 
  high-frequency IRQ with anybody else in the system. If, for 
  example, the network card was using the same IRQ, then the 
  network driver would need to be scheduled every 10us only 
  to look and see "oh, that wasn't mine". A good way to drive 
  system load to its limits. use "pidin irq" to see who has 
  attached to which IRQs.
* Use an interrupt handler (InterruptAttach()), not an event.
* From that handler, return an event only in a very few cases.
  Use the handler to "divide" the event frequency.

Hope this helps,
- Thomas


> -----Ursprüngliche Nachricht-----
> Von: Hugo Zijlmans [mailto:hugozijlmans@hotmail.com]
> Gesendet: 20 March 2008 09:45
> An: ostech-core_os
> Betreff: Interrupt Latency
> 
> 
> Hi everyone,
> 
> I'm trying to setup a system where I can synchronise a 100kHz 
> (10us) process through PCI on an interrupt base. There is a 
> huge problem with the kernel at this frequency of 
> interrupting. The thing is, when giving my ISR (with noting 
> more than an out32()) a priority higher than every other 
> system process, it stil takes about 3 us for the interrupt to 
> be handeled. Extually this is not the main problem.
> 
> When interrupting at this frequency the systemload tends to 
> be enormous. With kernel event trace (momentics) I get a 
> systemload of 100% and 80-90% is used bij system. Resulting 
> in the fact that the proccessor can't do anything else inbetween.
> 
> How is it possible that the ISR takes up so much of system 
> resources, even when the ISR itself doesn't do that much? It 
> isnt't that big of a problem that the interrupt latency 
> itself is x us but that the system is busy 100% means that I 
> can't do anything else!
> 
> _______________________________________________
> OSTech
> http://community.qnx.com/sf/go/post5993
>