Lorenz Bucher
|
Inter-Thread Communication on Multicore Systems
|
Lorenz Bucher
10/29/2008 12:09 PM
post15672
|
Inter-Thread Communication on Multicore Systems
Hello,
I'm quite new to QNX and thread programming in general, and I have the following situation on my Intel quad-core machine
:
An interrupt on the parallel port is processed by an interrupt handler thread, which wakes up 4 worker threads, which
each have their processor affinity set to a specific core, so 1 worker thread per core.
The main idea is the following: The interrupt arrives -> 4 worker threads start processing *independent* data blocks at
the same time.
I've tried out several mechanisms for this "wake up" of the worker threads: condvar/mutex, barrier, sleepon. If I use
sleepon for example, I still need a shared variable between the worker threads, which I need to access using a lock/
unlock sequence. This results in the worker threads not starting at the same time, as each one needs exclusive access to
the mutex for a short period time. The workers start about 10us staggered from each other, even though they could all
start at the same time.
Can anyone give me some suggestions as to which synchronization mechanism to use for this application? I've been
thinking about using atomic operations to toggle a flag or going "down" to IPC...
Thanks in advance for any hints,
~Lorenz
|
|
|
Hans-Peter Reichert
|
AW: Inter-Thread Communication on Multicore Systems
|
Hans-Peter Reichert
10/29/2008 2:20 PM
post15685
|
AW: Inter-Thread Communication on Multicore Systems
take a look at the pthread_barrier_*() functions.
/hp
-----Ursprüngliche Nachricht-----
Von: Lorenz Bucher [mailto:community-noreply@qnx.com]
Gesendet: Mi 29.10.2008 17:09
An: ostech-core_os
Cc:
Betreff: Inter-Thread Communication on Multicore Systems
Hello,
I'm quite new to QNX and thread programming in general, and I have the following situation on my Intel quad-core machine
:
An interrupt on the parallel port is processed by an interrupt handler thread, which wakes up 4 worker threads, which
each have their processor affinity set to a specific core, so 1 worker thread per core.
The main idea is the following: The interrupt arrives -> 4 worker threads start processing *independent* data blocks at
the same time.
I've tried out several mechanisms for this "wake up" of the worker threads: condvar/mutex, barrier, sleepon. If I use
sleepon for example, I still need a shared variable between the worker threads, which I need to access using a lock/
unlock sequence. This results in the worker threads not starting at the same time, as each one needs exclusive access to
the mutex for a short period time. The workers start about 10us staggered from each other, even though they could all
start at the same time.
Can anyone give me some suggestions as to which synchronization mechanism to use for this application? I've been
thinking about using atomic operations to toggle a flag or going "down" to IPC...
Thanks in advance for any hints,
~Lorenz
_______________________________________________
OSTech
http://community.qnx.com/sf/go/post15672
*******************************************
Harman Becker Automotive Systems GmbH
Management Board: Dr. Klaus Blickle (Chairman), Dr. Udo Hüls, Michael Mauser, Regis Baudot
Chairman of the Supervisory Board: Ansgar Rempp | Domicile: Karlsbad |
Local Court Mannheim: Register No. 361395
*******************************************
Diese E-Mail enthaelt vertrauliche und/oder rechtlich geschuetzte Informationen. Wenn Sie nicht der richtige Adressat
sind oder diese E-Mail irrtuemlich erhalten haben, informieren Sie bitte sofort den Absender und loeschen Sie diese Mail
. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet.
This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have
received this e-mail in error) please notify the sender immediately and delete this e-mail. Any unauthorized copying,
disclosure or distribution of the contents in this e-mail is strictly forbidden.
*******************************************
|
|
|
Lorenz Bucher
|
Re: AW: Inter-Thread Communication on Multicore Systems
|
Lorenz Bucher
10/30/2008 6:35 AM
post15700
|
Re: AW: Inter-Thread Communication on Multicore Systems
Thanks for the hint, but I already tried that function (it was actually the first thing I tried), but as far as I can
tell, it isn't the optimal solution for what I need. I also had some problems when I iterated the entire thing (several
interrupts).
The barrier_wait statement waits for a specific number of threads to arrive at the barrier before continuing; a certain
number of threads "rendezvous" at a barrier.
This entire mechanism being driven by an *amount* of events rather by specific events is what makes me so skeptical.. I
will try it again though... maybe I got something wrong?
I tried the following sequence:
1. init the barrier (count 5)
2. start the 4 worker threads, which lock on barrier_wait() -> count is down to 1
3. Interrupt thread waits for interrupt
4. when it got it, it calls barrier_wait(), therefore unblocking all 4 worker threads.
5. after the workers finished processing, they return to a barrier_wait() block.
As far as I've read, I don't need to init the barrier after it was released once, this should happen automatically.
Is this approach more or less valid?
Mit freundlichen Gruessen,
~Lorenz
|
|
|
Hans-Peter Reichert
|
AW: AW: Inter-Thread Communication on Multicore Systems
|
Hans-Peter Reichert
10/30/2008 8:03 AM
post15701
|
AW: AW: Inter-Thread Communication on Multicore Systems
sorry, this was a quick guess.
I did a lookup in out source and guess what... we use something different.
We use a normal pthread_cond_t and do a pthread_cond_broadcast() on this condition,
in this case you still need the mutex for the condition, but the broadcast is doing its job.
/hp
>-----Ursprüngliche Nachricht-----
>Von: Lorenz Bucher [mailto:community-noreply@qnx.com]
>Gesendet: Donnerstag, 30. Oktober 2008 11:35
>An: ostech-core_os
>Betreff: Re: AW: Inter-Thread Communication on Multicore Systems
>
>Thanks for the hint, but I already tried that function (it was
>actually the first thing I tried), but as far as I can tell,
>it isn't the optimal solution for what I need. I also had some
>problems when I iterated the entire thing (several interrupts).
>
>The barrier_wait statement waits for a specific number of
>threads to arrive at the barrier before continuing; a certain
>number of threads "rendezvous" at a barrier.
>
>This entire mechanism being driven by an *amount* of events
>rather by specific events is what makes me so skeptical.. I
>will try it again though... maybe I got something wrong?
>
>I tried the following sequence:
>
>1. init the barrier (count 5)
>2. start the 4 worker threads, which lock on barrier_wait() ->
>count is down to 1 3. Interrupt thread waits for interrupt 4.
>when it got it, it calls barrier_wait(), therefore unblocking
>all 4 worker threads.
>5. after the workers finished processing, they return to a
>barrier_wait() block.
>
>As far as I've read, I don't need to init the barrier after it
>was released once, this should happen automatically.
>
>Is this approach more or less valid?
>
>Mit freundlichen Gruessen,
>~Lorenz
>
>
>_______________________________________________
>OSTech
>http://community.qnx.com/sf/go/post15700
>
>
*******************************************
Harman Becker Automotive Systems GmbH
Management Board: Dr. Klaus Blickle (Chairman), Dr. Udo Hüls, Michael Mauser, Regis Baudot
Chairman of the Supervisory Board: Ansgar Rempp | Domicile: Karlsbad |
Local Court Mannheim: Register No. 361395
*******************************************
Diese E-Mail enthaelt vertrauliche und/oder rechtlich geschuetzte Informationen. Wenn Sie nicht der richtige Adressat
sind oder diese E-Mail irrtuemlich erhalten haben, informieren Sie bitte sofort den Absender und loeschen Sie diese Mail
. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet.
This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have
received this e-mail in error) please notify the sender immediately and delete this e-mail. Any unauthorized copying,
disclosure or distribution of the contents in this e-mail is strictly forbidden.
*******************************************
|
|
|
Ken Schumm
|
Re: AW: AW: Inter-Thread Communication on Multicore Systems
|
Ken Schumm
10/30/2008 10:00 AM
post15707
|
Re: AW: AW: Inter-Thread Communication on Multicore Systems
> sorry, this was a quick guess.
> I did a lookup in out source and guess what... we use something different.
> We use a normal pthread_cond_t and do a pthread_cond_broadcast() on this
> condition,
> in this case you still need the mutex for the condition, but the broadcast is
> doing its job.
> /hp
We use pthread_cond_broadcast() in this way to accomplish the same effect and it seems to work fine (although I can't
say we've analyzed the order or timing of the worker threads unblocking)
|
|
|
Lorenz Bucher
|
Re: AW: AW: Inter-Thread Communication on Multicore Systems
|
Lorenz Bucher
10/30/2008 10:35 AM
post15708
|
Re: AW: AW: Inter-Thread Communication on Multicore Systems
Thanks for the replies, guys.
Yeah, pthread_cond_broadcast() is also what I've been using because I found it to be the most logical mechanism. I lose
about 10 us (microseconds) between the worker threads, because each one locks/unlocks the common mutex variable.
I could also use 4 different mutexes (each worker its own) and call pthread_cond_signal() 4 times to wake them up, but
this seems inefficient.
I just think this "waste" in the thread synch could be eliminated.. It just seems unnecessary to me. But then again, I
have almost no experience with these things.
~Lorenz
|
|
|
Lorenz Bucher
|
Re: AW: Inter-Thread Communication on Multicore Systems
|
Lorenz Bucher
10/30/2008 1:14 PM
post15734
|
Re: AW: Inter-Thread Communication on Multicore Systems
Ok, so I implemented it with a barrier:
Please look at the attached timeline screenshot from the Momentics Profiler.
I nicely labeled the threads. Each Processing Thread (worker) is bound to one CPU. The Interrupt Handler is bound to
CPU1.
Can anybody explain me why the worker thread 1 starts so late, for example? The CPU2 is idle the whole time!?!
Really strange behaviour, in my opinion.
Here's my code:
The Interrupt Handler:
while(1)
{
InterruptWait(NULL, NULL);
pthread_barrier_wait (&barrier);
InterruptUnmask( PARALLEL_IRQ, interruptID );
}
Each worker thread does this:
while(1)
{
pthread_barrier_wait (&barrier);
// Processing per iteration
for ( i = 0; i < 1000; i++ )
{
tmp = ((i/2) / 3.2) + 1.0;
}
}
~Lorenz
p.s. Is there a better way to post code here? (e.g. syntax highlighting?)
|
|
|
Lorenz Bucher
|
Re: AW: Inter-Thread Communication on Multicore Systems
|
Lorenz Bucher
10/30/2008 1:17 PM
post15736
|
Re: AW: Inter-Thread Communication on Multicore Systems
... As there doesn't seem to be a way to attach multiple files (besides using a .zip) in this forum, I'll have to post
the second screenshot in a separate post.
Here you can see how the 4 workerthreads are released *sequentially* from the barrier, not at the same time.
~Lorenz
|
|
|
Colin Burgess(deleted)
|
Re: AW: Inter-Thread Communication on Multicore Systems
|
Colin Burgess(deleted)
10/30/2008 1:24 PM
post15739
|
Re: AW: Inter-Thread Communication on Multicore Systems
Of course - how can you change 4 thread states at the same time - an SIMD instruction? :-)
Remember that thread state change events that occur while the kernel is running can be considered
simultaneous - it's only when the kernel is exited that any of those threads will actually be made
running.
In the first trace you can see the delay in causing the reschedule on each processor - the kernel
has to send an IPI to each CPU, which forces an entry/exit to the kernel to cause the reschedule.
This all takes time...
Are the other CPUs dedicated? A busy wait on a spinlock may in that case be your solution. If the worker
threads must remain blocked then condvars are probably your best bet - or sending a pulse to each thread.
PS - You may be interested to know that barrier wait uses a condvar... it's not a kernel primitive.
Lorenz Bucher wrote:
> ... As there doesn't seem to be a way to attach multiple files (besides using a .zip) in this forum, I'll have to post
the second screenshot in a separate post.
>
> Here you can see how the 4 workerthreads are released *sequentially* from the barrier, not at the same time.
>
> ~Lorenz
>
> _______________________________________________
> OSTech
> http://community.qnx.com/sf/go/post15736
>
>
> ------------------------------------------------------------------------
>
--
cburgess@qnx.com
|
|
|
Lorenz Bucher
|
Re: AW: Inter-Thread Communication on Multicore Systems
|
Lorenz Bucher
10/31/2008 5:12 AM
post15774
|
Re: AW: Inter-Thread Communication on Multicore Systems
Thanks for your reply!
> Of course - how can you change 4 thread states at the same time - an SIMD
> instruction? :-)
Oops.. thanks for that pointer. Here I'm confronted with my lack of experience in multiprocessor systems. I tend to
think of every core as an independent resource, but forget that the kernel can only be entered by one cpu at a time.
> Are the other CPUs dedicated? A busy wait on a spinlock may in that case be
> your solution. If the worker
> threads must remain blocked then condvars are probably your best bet - or
> sending a pulse to each thread.
Basically yes, the other CPUs are dedicated to running their respective worker threads. I was planning to have the first
CPU handle system tasks, interrupts, etc.. and "feeding" the worker threads on the other 3 CPUs.
This would mean that, ideally, the kernel would only be locked by those 3 CPUs once in every interrupt interval (when
the interrupt arrives and the worker threads are scheduled).
My next strategy would have been to go down to IPC and message passing .. Now that you mention pulses-- I definitely
have some reading to do!
> PS - You may be interested to know that barrier wait uses a condvar... it's
> not a kernel primitive.
Yeah, I noticed that after I posted yesterday, when I saw the SynchMutex and SynchCondvar Kernel calls in the trace--
Thanks again,
~Lorenz
|
|
|
Lorenz Bucher
|
Re: AW: Inter-Thread Communication on Multicore Systems
|
Lorenz Bucher
11/04/2008 4:30 AM
post15898
|
Re: AW: Inter-Thread Communication on Multicore Systems
Ok, I did an implementation using busy loops, but I seem to be having some trouble with the timing of the spinlock
release.
I reduced the number of worker threads to 3 for now, so I've got CPU1 dedicated to the main loop and CPUs 2-4 dedicated
to one worker thread each.
I use atomic_toggle() with a volatile flag variable in the main thread and detect when this variable changes in the
worker threads.
Main thread:
while(1)
{
InterruptWait();
atomic_toggle(&flag, 0x1);
InterruptUnmask();
}
Worker threads:
while(1)
{
do
{
flag_old = flag;
}
while (flag_old == flag);
doTheWork();
}
I know I should probably make flag_old = flag using atomic operations too... but I would need if conditions to know
whether I should use set() or clear(). I tried it with these conditions, but it doesn't work correctly either.
I put a counter that I increment atomically in the workers to track the number of times a worker gets past its do/while
loop. The results show that the workers are NOT always called the same number of times.
Of course, I have variables like the amount of work they do and the actual interrupt rate, but I've reduced both to no
avail.
My next step is to go down to using message-passing.
I anybody can spot an obvious (or less obvious) flaw in my concept, please let me know!
~Lorenz
|
|
|
Douglas Bailey
|
RE: AW: Inter-Thread Communication on Multicore Systems
|
Douglas Bailey
11/04/2008 9:37 AM
post15911
|
RE: AW: Inter-Thread Communication on Multicore Systems
You've got a race condition in your worker thread. If flag changes
between testing the condition and the assignment in the loop, you won't
catch the change.
How about something like this for the worker thread:
flag_old = flag;
while (1) {
do {} while (flag_old == flag);
flag_old = flag;
do_the_work();
}
This still has some requirements for the frequency of your interrupts
vs. the time required for do_the_work() -- if you can get two interrupts
while in do_the_work(), you will miss them...
I'd prefer something like the following for the main thread:
main thread:
while(1) {
InterruptWait();
flag++;
InterruptUnmask();
}
This way you can't miss an interrupt entirely, but if two come in while
do_the_work() is running you will only invoke do_the_work() once more
when you get out.
Doug
-----Original Message-----
From: Lorenz Bucher [mailto:community-noreply@qnx.com]
Sent: Tuesday, November 04, 2008 4:31 AM
To: ostech-core_os
Subject: Re: AW: Inter-Thread Communication on Multicore Systems
Ok, I did an implementation using busy loops, but I seem to be having
some trouble with the timing of the spinlock release.
I reduced the number of worker threads to 3 for now, so I've got CPU1
dedicated to the main loop and CPUs 2-4 dedicated to one worker thread
each.
I use atomic_toggle() with a volatile flag variable in the main thread
and detect when this variable changes in the worker threads.
Main thread:
while(1)
{
InterruptWait();
atomic_toggle(&flag, 0x1);
InterruptUnmask();
}
Worker threads:
while(1)
{
do
{
flag_old = flag;
}
while (flag_old == flag);
doTheWork();
}
I know I should probably make flag_old = flag using atomic operations
too... but I would need if conditions to know whether I should use set()
or clear(). I tried it with these conditions, but it doesn't work
correctly either.
I put a counter that I increment atomically in the workers to track the
number of times a worker gets past its do/while loop. The results show
that the workers are NOT always called the same number of times.
Of course, I have variables like the amount of work they do and the
actual interrupt rate, but I've reduced both to no avail.
My next step is to go down to using message-passing.
I anybody can spot an obvious (or less obvious) flaw in my concept,
please let me know!
~Lorenz
_______________________________________________
OSTech
http://community.qnx.com/sf/go/post15898
|
|
|
Lorenz Bucher
|
Re: RE: AW: Inter-Thread Communication on Multicore Systems
|
Lorenz Bucher
11/04/2008 10:47 AM
post15918
|
Re: RE: AW: Inter-Thread Communication on Multicore Systems
Thanks for your reply, Doug.
Duh... how could I've not seen that race condition there? Thanks for
pointing it out to me. I adjusted my thread accordingly, and it seems to work properly now; at least the counters for
all 3 worker threads are equal now :-)
Your suggestion about the main thread is basically using a counter variable instead of a flag to guarantee proper
synchronization, right? That's definitely something to consider.. even wraparound shouldn't be an issue, as it's only
comparing wheter the counter is equal or not (no < or >).
With the atomic_toggle function, I'm just avoiding problems for two consecutive interrupts.
About getting two interrupts while I'm still processing:
I'm currently working on (at least) detecting that using a "watchdog" variable that is set by the workers when they're
done and checked when a new interrupt arrives. The workload of the workers will of course have to be adjusted to match
the interrupt rate.
~Lorenz
|
|
|
|