Project Home
Project Home
Documents
Documents
Wiki
Wiki
Discussion Forums
Discussions
Project Information
Project Info
Forum Topic - Inter-Thread Communication on Multicore Systems: (13 Items)
   
Inter-Thread Communication on Multicore Systems  
Hello,

I'm quite new to QNX and thread programming in general, and I have the following situation on my Intel quad-core machine
:

An interrupt on the parallel port is processed by an interrupt handler thread, which wakes up 4 worker threads, which 
each have their processor affinity set to a specific core, so 1 worker thread per core.
The main idea is the following: The interrupt arrives -> 4 worker threads start processing *independent* data blocks at 
the same time.

I've tried out several mechanisms for this "wake up" of the worker threads: condvar/mutex, barrier, sleepon. If I use 
sleepon for example, I still need a shared variable between the worker threads, which I need to access using a lock/
unlock sequence. This results in the worker threads not starting at the same time, as each one needs exclusive access to
 the mutex for a short period time. The workers start about 10us staggered from each other, even though they could all 
start at the same time.

Can anyone give me some suggestions as to which synchronization mechanism to use for this application? I've been 
thinking about using atomic operations to toggle a flag or going "down" to IPC...

Thanks in advance for any hints,
~Lorenz

AW: Inter-Thread Communication on Multicore Systems  
take a look at the pthread_barrier_*() functions.
/hp


-----Ursprüngliche Nachricht-----
Von:	Lorenz Bucher [mailto:community-noreply@qnx.com]
Gesendet:	Mi 29.10.2008 17:09
An:	ostech-core_os
Cc:	
Betreff:	Inter-Thread Communication on Multicore Systems

Hello,

I'm quite new to QNX and thread programming in general, and I have the following situation on my Intel quad-core machine
:

An interrupt on the parallel port is processed by an interrupt handler thread, which wakes up 4 worker threads, which 
each have their processor affinity set to a specific core, so 1 worker thread per core.
The main idea is the following: The interrupt arrives -> 4 worker threads start processing *independent* data blocks at 
the same time.

I've tried out several mechanisms for this "wake up" of the worker threads: condvar/mutex, barrier, sleepon. If I use 
sleepon for example, I still need a shared variable between the worker threads, which I need to access using a lock/
unlock sequence. This results in the worker threads not starting at the same time, as each one needs exclusive access to
 the mutex for a short period time. The workers start about 10us staggered from each other, even though they could all 
start at the same time.

Can anyone give me some suggestions as to which synchronization mechanism to use for this application? I've been 
thinking about using atomic operations to toggle a flag or going "down" to IPC...

Thanks in advance for any hints,
~Lorenz



_______________________________________________
OSTech
http://community.qnx.com/sf/go/post15672 
 
*******************************************
Harman Becker Automotive Systems GmbH
Management Board: Dr. Klaus Blickle (Chairman), Dr. Udo Hüls, Michael Mauser, Regis Baudot
Chairman of the Supervisory Board: Ansgar Rempp | Domicile: Karlsbad | 
Local Court Mannheim: Register No. 361395

 
*******************************************
Diese E-Mail enthaelt vertrauliche und/oder rechtlich geschuetzte Informationen. Wenn Sie nicht der richtige Adressat 
sind oder diese E-Mail irrtuemlich erhalten haben, informieren Sie bitte sofort den Absender und loeschen Sie diese Mail
. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet.
This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have 
received this e-mail in error) please notify the sender immediately and delete this e-mail. Any unauthorized copying, 
disclosure or distribution of the contents in this e-mail is strictly forbidden.
*******************************************
Attachment: Text winmail.dat 4.03 KB
Re: AW: Inter-Thread Communication on Multicore Systems  
Thanks for the hint, but I already tried that function (it was actually the first thing I tried), but as far as I can 
tell, it isn't the optimal solution for what I need. I also had some problems when I iterated the entire thing (several 
interrupts).

The barrier_wait statement waits for a specific number of threads to arrive at the barrier before continuing; a certain 
number of threads "rendezvous" at a barrier.

This entire mechanism being driven by an *amount* of events rather by specific events is what makes me so skeptical.. I 
will try it again though... maybe I got something wrong?

I tried the following sequence:

1. init the barrier (count 5)
2. start the 4 worker threads, which lock on barrier_wait() -> count is down to 1
3. Interrupt thread waits for interrupt
4. when it got it, it calls barrier_wait(), therefore unblocking all 4 worker threads.
5. after the workers finished processing, they return to a barrier_wait() block.

As far as I've read, I don't need to init the barrier after it was released once, this should happen automatically.

Is this approach more or less valid?

Mit freundlichen Gruessen,
~Lorenz
AW: AW: Inter-Thread Communication on Multicore Systems  
sorry, this was a quick guess.
I did a lookup in out source and guess what... we use something different.
We use a normal pthread_cond_t and do a pthread_cond_broadcast() on this condition,
in this case you still need the mutex for the condition, but the broadcast is doing its job.
/hp

>-----Ursprüngliche Nachricht-----
>Von: Lorenz Bucher [mailto:community-noreply@qnx.com] 
>Gesendet: Donnerstag, 30. Oktober 2008 11:35
>An: ostech-core_os
>Betreff: Re: AW: Inter-Thread Communication on Multicore Systems
>
>Thanks for the hint, but I already tried that function (it was 
>actually the first thing I tried), but as far as I can tell, 
>it isn't the optimal solution for what I need. I also had some 
>problems when I iterated the entire thing (several interrupts).
>
>The barrier_wait statement waits for a specific number of 
>threads to arrive at the barrier before continuing; a certain 
>number of threads "rendezvous" at a barrier.
>
>This entire mechanism being driven by an *amount* of events 
>rather by specific events is what makes me so skeptical.. I 
>will try it again though... maybe I got something wrong?
>
>I tried the following sequence:
>
>1. init the barrier (count 5)
>2. start the 4 worker threads, which lock on barrier_wait() -> 
>count is down to 1 3. Interrupt thread waits for interrupt 4. 
>when it got it, it calls barrier_wait(), therefore unblocking 
>all 4 worker threads.
>5. after the workers finished processing, they return to a 
>barrier_wait() block.
>
>As far as I've read, I don't need to init the barrier after it 
>was released once, this should happen automatically.
>
>Is this approach more or less valid?
>
>Mit freundlichen Gruessen,
>~Lorenz
>
>
>_______________________________________________
>OSTech
>http://community.qnx.com/sf/go/post15700
>
> 
 
*******************************************
Harman Becker Automotive Systems GmbH
Management Board: Dr. Klaus Blickle (Chairman), Dr. Udo Hüls, Michael Mauser, Regis Baudot
Chairman of the Supervisory Board: Ansgar Rempp | Domicile: Karlsbad | 
Local Court Mannheim: Register No. 361395

 
*******************************************
Diese E-Mail enthaelt vertrauliche und/oder rechtlich geschuetzte Informationen. Wenn Sie nicht der richtige Adressat 
sind oder diese E-Mail irrtuemlich erhalten haben, informieren Sie bitte sofort den Absender und loeschen Sie diese Mail
. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet.
This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have 
received this e-mail in error) please notify the sender immediately and delete this e-mail. Any unauthorized copying, 
disclosure or distribution of the contents in this e-mail is strictly forbidden.
*******************************************
Re: AW: AW: Inter-Thread Communication on Multicore Systems  
> sorry, this was a quick guess.
> I did a lookup in out source and guess what... we use something different.
> We use a normal pthread_cond_t and do a pthread_cond_broadcast() on this 
> condition,
> in this case you still need the mutex for the condition, but the broadcast is 
> doing its job.
> /hp

We use pthread_cond_broadcast() in this way to accomplish the same effect and it seems to work fine (although I can't 
say we've analyzed the order or timing of the worker threads unblocking)
Re: AW: AW: Inter-Thread Communication on Multicore Systems  
Thanks for the replies, guys.

Yeah, pthread_cond_broadcast() is also what I've been using because I found it to be the most logical mechanism. I lose 
about 10 us (microseconds) between the worker threads, because each one locks/unlocks the common mutex variable.

I could also use 4 different mutexes (each worker its own) and call pthread_cond_signal() 4 times to wake them up, but 
this seems inefficient.

I just think this "waste" in the thread synch could be eliminated.. It just seems unnecessary to me. But then again, I 
have almost no experience with these things.

~Lorenz
Re: AW: Inter-Thread Communication on Multicore Systems  
Ok, so I implemented it with a barrier:

Please look at the attached timeline screenshot from the Momentics Profiler.

I nicely labeled the threads. Each Processing Thread (worker) is bound to one CPU. The Interrupt Handler is bound to 
CPU1.

Can anybody explain me why the worker thread 1 starts so late, for example? The CPU2 is idle the whole time!?!
Really strange behaviour, in my opinion.

Here's my code:

The Interrupt Handler:
while(1)
	{		
		InterruptWait(NULL, NULL);
		pthread_barrier_wait (&barrier);
		InterruptUnmask( PARALLEL_IRQ, interruptID );
	}	

Each worker thread does this:

while(1)
	{	
		pthread_barrier_wait (&barrier);
		
		// Processing per iteration
		for ( i = 0; i < 1000; i++ )
		{
			tmp = ((i/2) / 3.2) + 1.0;
		}		
	}	


~Lorenz

p.s. Is there a better way to post code here? (e.g. syntax highlighting?)
Attachment: Image qnx_screenshot_barrier.JPG 79.47 KB
Re: AW: Inter-Thread Communication on Multicore Systems  
... As there doesn't seem to be a way to attach multiple files (besides using a .zip) in this forum, I'll have to post 
the second screenshot in a separate post.

Here you can see how the 4 workerthreads are released *sequentially* from the barrier, not at the same time.

~Lorenz
Attachment: Image qnx_screenshot_barrier_detail.JPG 42.64 KB
Re: AW: Inter-Thread Communication on Multicore Systems  
Of course - how can you change 4 thread states at the same time - an SIMD instruction? :-)

Remember that thread state change events that occur while the kernel is running can be considered
simultaneous - it's only when the kernel is exited that any of those threads will actually be made
running.

In the first trace you can see the delay in causing the reschedule on each processor - the kernel
has to send an IPI to each CPU, which forces an entry/exit to the kernel to cause the reschedule.

This all takes time...

Are the other CPUs dedicated?  A busy wait on a spinlock may in that case be your solution.  If the worker
threads must remain blocked then condvars are probably your best bet - or sending a pulse to each thread.

PS - You may be interested to know that barrier wait uses a condvar... it's not a kernel primitive.

Lorenz Bucher wrote:
> ... As there doesn't seem to be a way to attach multiple files (besides using a .zip) in this forum, I'll have to post
 the second screenshot in a separate post.
> 
> Here you can see how the 4 workerthreads are released *sequentially* from the barrier, not at the same time.
> 
> ~Lorenz
> 
> _______________________________________________
> OSTech
> http://community.qnx.com/sf/go/post15736
> 
> 
> ------------------------------------------------------------------------
> 

-- 
cburgess@qnx.com
Re: AW: Inter-Thread Communication on Multicore Systems  
Thanks for your reply!

> Of course - how can you change 4 thread states at the same time - an SIMD 
> instruction? :-)

Oops.. thanks for that pointer. Here I'm confronted with my lack of experience in multiprocessor systems. I tend to 
think of every core as an independent resource, but forget that the kernel can only be entered by one cpu at a time.


> Are the other CPUs dedicated?  A busy wait on a spinlock may in that case be 
> your solution.  If the worker
> threads must remain blocked then condvars are probably your best bet - or 
> sending a pulse to each thread.

Basically yes, the other CPUs are dedicated to running their respective worker threads. I was planning to have the first
 CPU handle system tasks, interrupts, etc.. and "feeding" the worker threads on the other 3 CPUs.

This would mean that, ideally, the kernel would only be locked by those 3 CPUs once in every interrupt interval (when 
the interrupt arrives and the worker threads are scheduled).

My next strategy would have been to go down to IPC and message passing .. Now that you mention pulses-- I definitely 
have some reading to do!


> PS - You may be interested to know that barrier wait uses a condvar... it's 
> not a kernel primitive.

Yeah, I noticed that after I posted yesterday, when I saw the SynchMutex and SynchCondvar Kernel calls in the trace--


Thanks again,
~Lorenz
Re: AW: Inter-Thread Communication on Multicore Systems  
Ok, I did an implementation using busy loops, but I seem to be having some trouble with the timing of the spinlock 
release.

I reduced the number of worker threads to 3 for now, so I've got CPU1 dedicated to the main loop and CPUs 2-4 dedicated 
to one worker thread each.

I use atomic_toggle() with a volatile flag variable in the main thread and detect when this variable changes in the 
worker threads.

Main thread:
while(1)
{
     InterruptWait();
     atomic_toggle(&flag, 0x1);
     InterruptUnmask();
}

Worker threads:
while(1)
{
     do
     {
          flag_old = flag;
     }
     while (flag_old == flag);

     doTheWork();
}


I know I should probably make flag_old = flag using atomic operations too... but I would need if conditions to know 
whether I should use set() or clear(). I tried it with these conditions, but it doesn't work correctly either.

I put a counter that I increment atomically in the workers to track the number of times a worker gets past its do/while 
loop. The results show that the workers are NOT always called the same number of times.

Of course, I have variables like the amount of work they do and the actual interrupt rate, but I've reduced both to no 
avail.

My next step is to go down to using message-passing.

I anybody can spot an obvious (or less obvious) flaw in my concept, please let me know!

~Lorenz


RE: AW: Inter-Thread Communication on Multicore Systems  
You've got a race condition in your worker thread.  If flag changes
between testing the condition and the assignment in the loop, you won't
catch the change.

How about something like this for the worker thread:

flag_old = flag;
while (1) {
   do {} while (flag_old == flag);
   flag_old = flag;
   do_the_work();
}

This still has some requirements for the frequency of your interrupts
vs. the time required for do_the_work() -- if you can get two interrupts
while in do_the_work(), you will miss them...

I'd prefer something like the following for the main thread:

main thread:
while(1) {
   InterruptWait();
   flag++;
   InterruptUnmask();
}

This way you can't miss an interrupt entirely, but if two come in while
do_the_work() is running you will only invoke do_the_work() once more
when you get out.

Doug


-----Original Message-----
From: Lorenz Bucher [mailto:community-noreply@qnx.com] 
Sent: Tuesday, November 04, 2008 4:31 AM
To: ostech-core_os
Subject: Re: AW: Inter-Thread Communication on Multicore Systems

Ok, I did an implementation using busy loops, but I seem to be having
some trouble with the timing of the spinlock release.

I reduced the number of worker threads to 3 for now, so I've got CPU1
dedicated to the main loop and CPUs 2-4 dedicated to one worker thread
each.

I use atomic_toggle() with a volatile flag variable in the main thread
and detect when this variable changes in the worker threads.

Main thread:
while(1)
{
     InterruptWait();
     atomic_toggle(&flag, 0x1);
     InterruptUnmask();
}

Worker threads:
while(1)
{
     do
     {
          flag_old = flag;
     }
     while (flag_old == flag);

     doTheWork();
}


I know I should probably make flag_old = flag using atomic operations
too... but I would need if conditions to know whether I should use set()
or clear(). I tried it with these conditions, but it doesn't work
correctly either.

I put a counter that I increment atomically in the workers to track the
number of times a worker gets past its do/while loop. The results show
that the workers are NOT always called the same number of times.

Of course, I have variables like the amount of work they do and the
actual interrupt rate, but I've reduced both to no avail.

My next step is to go down to using message-passing.

I anybody can spot an obvious (or less obvious) flaw in my concept,
please let me know!

~Lorenz




_______________________________________________
OSTech
http://community.qnx.com/sf/go/post15898
Re: RE: AW: Inter-Thread Communication on Multicore Systems  
Thanks for your reply, Doug.

Duh... how could I've not seen that race condition there? Thanks for 
pointing it out to me. I adjusted my thread accordingly, and it seems to work properly now; at least the counters for 
all 3 worker threads are equal now :-)

Your suggestion about the main thread is basically using a counter variable instead of a flag to guarantee proper 
synchronization, right? That's definitely something to consider.. even wraparound shouldn't be an issue, as it's only 
comparing wheter the counter is equal or not (no < or >).
With the atomic_toggle function, I'm just avoiding problems for two consecutive interrupts.

About getting two interrupts while I'm still processing:
I'm currently working on (at least) detecting that using a "watchdog" variable that is set by the workers when they're 
done and checked when a new interrupt arrives. The workload of the workers will of course have to be adjusted to match 
the interrupt rate.

~Lorenz