Andreas Polychronopoulos
02/28/2009 8:30 AM
post23159
|
Hi everybody,
I am trying to measure the time a mutex take to lock and unlock , here is my code
*****************************************************
void *functionA();
int main(int argc, char *argv[]) {
/* initialize mutex1 */
pthread_mutex_init(&mutex1, NULL);
/* create threads */
pthread_create(&tid_A, NULL, &functionA ,NULL);
/* 2 secs sleep */
sleep(2);
}
void *functionA(){
pthread_mutex_lock(&mutex1);
pthread_mutex_unlock(&mutex1);
}
*******************************************************
I build the application with the IDE and at Build Options i choosed , build for profiling (Function Instrumentation)
Then i create a new run configuration for the _g binary adding the appropriate Application Profiler tool on Tools bar of
the Open Run dialog.
I can get results for the function main and functionA but not for pthread_mutex_lock,pthread_mutex_unlock. Is it
possible to get the time for each line of code ?
The Functon Instrumentation mode is good for single thread application what about add more threads on the above code ?
Will i get correct results or should i do a different profile ?
thanks
|
|
|
Hans-Peter Reichert
02/28/2009 12:58 PM
post23160
|
let me start from behind:
Application Profiler is yet not capable of detecting thread preemtion,
so what you will get when profiling a multi threaded app is always REALTIME not CPUTIME.
With function instrumentation you're able to measure every function that is compiled with this option.
As pthread_*() is all part of libc.so you can't get numbers for this functions unless you use an instrumented version of
libc, which you will have to build on your own.
if you're planning to do benchmarks on some OS objects, this is probably not the right approach - you should use
different time functions for this.
HTH
/hp
-----Ursprüngliche Nachricht-----
Von: Andreas Polychronopoulos [mailto:community-noreply@qnx.com]
Gesendet: Sa 28.02.2009 14:30
An: ostech-core_os
Cc:
Betreff: Application Profiler
Hi everybody,
I am trying to measure the time a mutex take to lock and unlock , here is my code
*****************************************************
void *functionA();
int main(int argc, char *argv[]) {
/* initialize mutex1 */
pthread_mutex_init(&mutex1, NULL);
/* create threads */
pthread_create(&tid_A, NULL, &functionA ,NULL);
/* 2 secs sleep */
sleep(2);
}
void *functionA(){
pthread_mutex_lock(&mutex1);
pthread_mutex_unlock(&mutex1);
}
*******************************************************
I build the application with the IDE and at Build Options i choosed , build for profiling (Function Instrumentation)
Then i create a new run configuration for the _g binary adding the appropriate Application Profiler tool on Tools bar of
the Open Run dialog.
I can get results for the function main and functionA but not for pthread_mutex_lock,pthread_mutex_unlock. Is it
possible to get the time for each line of code ?
The Functon Instrumentation mode is good for single thread application what about add more threads on the above code ?
Will i get correct results or should i do a different profile ?
thanks
_______________________________________________
OSTech
http://community.qnx.com/sf/go/post23159
*******************************************
Harman Becker Automotive Systems GmbH
Management Board: Dr. Klaus Blickle (Chairman), Dr. Udo Hüls, Michael Mauser
Chairman of the Supervisory Board: Ansgar Rempp | Domicile: Karlsbad |
Local Court Mannheim: Register No. 361395
*******************************************
Diese E-Mail enthaelt vertrauliche und/oder rechtlich geschuetzte Informationen. Wenn Sie nicht der richtige Adressat
sind oder diese E-Mail irrtuemlich erhalten haben, informieren Sie bitte sofort den Absender und loeschen Sie diese Mail
. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet.
This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have
received this e-mail in error) please notify the sender immediately and delete this e-mail. Any unauthorized copying,
disclosure or distribution of the contents in this e-mail is strictly forbidden.
*******************************************
|
|
|
Andreas Polychronopoulos
|
Re: AW: Application Profiler
|
Andreas Polychronopoulos
02/28/2009 2:37 PM
post23162
|
Re: AW: Application Profiler
Thanks for the reply
> if you're planning to do benchmarks on some OS objects, this is probably not
> the right approach - you should use different time functions for this.
Yes i want to do benchmarks on synchronization objects like mutex. Do you mean that i should use some functions like
gettimeofday() ? I used gettimeofday() but I learned that this is not a good approach so I turned to Application
Profiler solution. Any advice would be helpful
thanks
|
|
|
Hans-Peter Reichert
|
AW: AW: Application Profiler
|
Hans-Peter Reichert
03/01/2009 4:00 AM
post23166
|
AW: AW: Application Profiler
timing of sync objects is something you should do with a high resolution time,
and you should do it propably not only once but in a loop.
so in pseudo code:
raise prio to max so nobody can preempt
get ticks per second
get Start Ticks
loop 10.0000
pthread_mutex_unlock()
done
get End Ticks
single time = (end - start) / ticks per sec / 10.000
check this: http://www.qnx.com/developers/docs/6.4.0/neutrino/lib_ref/s/syspage_entry.html
for an example
ClockCycles() is the highest resolution you can get on a QNX system,
and it is quite cheap, it's not a kernel call, but only an access to the mapped system page.
This method is convenient to do single threaded stuff,
on multi threaded stuff you either have to do some maths on your own
or you try to use each threads CPUCLOCK.
you do that this way:
clock = ClockId(pid, tid)
Clocktime(clock, ....)
do your stuff
ClockTime(clock, ...)
this method has the disadvantage that the resolution depends on the accounting that is done in the scheduler, which is
not 100% accurate
@QNX: what about releasing the OS benchmark suite on foundry?
/hp
-----Ursprüngliche Nachricht-----
Von: Andreas Polychronopoulos [mailto:community-noreply@qnx.com]
Gesendet: Sa 28.02.2009 20:37
An: ostech-core_os
Cc:
Betreff: Re: AW: Application Profiler
Thanks for the reply
> if you're planning to do benchmarks on some OS objects, this is probably not
> the right approach - you should use different time functions for this.
Yes i want to do benchmarks on synchronization objects like mutex. Do you mean that i should use some functions like
gettimeofday() ? I used gettimeofday() but I learned that this is not a good approach so I turned to Application
Profiler solution. Any advice would be helpful
thanks
_______________________________________________
OSTech
http://community.qnx.com/sf/go/post23162
*******************************************
Harman Becker Automotive Systems GmbH
Management Board: Dr. Klaus Blickle (Chairman), Dr. Udo Hüls, Michael Mauser
Chairman of the Supervisory Board: Ansgar Rempp | Domicile: Karlsbad |
Local Court Mannheim: Register No. 361395
*******************************************
Diese E-Mail enthaelt vertrauliche und/oder rechtlich geschuetzte Informationen. Wenn Sie nicht der richtige Adressat
sind oder diese E-Mail irrtuemlich erhalten haben, informieren Sie bitte sofort den Absender und loeschen Sie diese Mail
. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet.
This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have
received this e-mail in error) please notify the sender immediately and delete this e-mail. Any unauthorized copying,
disclosure or distribution of the contents in this e-mail is strictly forbidden.
*******************************************
|
|
|
Andreas Polychronopoulos
|
Re: AW: AW: Application Profiler
|
Andreas Polychronopoulos
03/01/2009 2:27 PM
post23167
|
Re: AW: AW: Application Profiler
> timing of sync objects is something you should do with a high resolution time,
>
> and you should do it propably not only once but in a loop.
> so in pseudo code:
> raise prio to max so nobody can preempt
> get ticks per second
> get Start Ticks
> loop 10.0000
> pthread_mutex_unlock()
> done
> get End Ticks
> single time = (end - start) / ticks per sec / 10.000
>
> check this: http://www.qnx.com/developers/docs/6.4.0/neutrino/lib_ref/s/
> syspage_entry.html
> for an example
>
> ClockCycles() is the highest resolution you can get on a QNX system,
> and it is quite cheap, it's not a kernel call, but only an access to the
> mapped system page.
>
> This method is convenient to do single threaded stuff,
> on multi threaded stuff you either have to do some maths on your own
> or you try to use each threads CPUCLOCK.
> you do that this way:
> clock = ClockId(pid, tid)
> Clocktime(clock, ....)
> do your stuff
> ClockTime(clock, ...)
> this method has the disadvantage that the resolution depends on the accounting
> that is done in the scheduler, which is not 100% accurate
>
> @QNX: what about releasing the OS benchmark suite on foundry?
>
Hi again,
It is my first time i am doing benchmarking on synchronization objects. So I followed the benchmark methodology of QNX
at: http://www.qnx.com/download/download/8829/QNX_NeutrinoRTOS_Kernel_Benchmark_Methodology.pdf
So i want to calculate the average cost of a contested and uncontested mutex. My code after your help is the following:
/* uncontested mutex */
uint64_t cps, cycle1, cycle2, ncycles;
double sec =0.0;
double micro_sec = 0.0;
pthread_mutex_t mutex;
int i=0;
int main(int argc, char *argv[]) {
pthread_mutex_init(&mutex,NULL);
cycle1 = ClockCycles( );
for(i=0;i<1000000;i++){
pthread_mutex_lock(&mutex);
pthread_mutex_unlock(&mutex);
}
cycle2 = ClockCycles( );
ncycles = cycle2 - cycle1;
printf("%lld cycles elapsed.\n", ncycles);
cps = SYSPAGE_ENTRY(qtime)->cycles_per_sec;
printf( "This system has %lld cycles/sec.\n",cps );
sec = (double)ncycles/cps;
printf("The cycles in seconds is %f.\n",sec);
return EXIT_SUCCESS;
}
------------------------------------------------------------------
/* contested mutex */
pthread_mutex_t mutex1;
pthread_t tid_A,tid_B,tid_C,tid_D;
/* functions executed by threads */
void *functionA();
void *functionB();
void *functionC();
void *functionD();
int main(int argc, char *argv[]) {
/* initialize mutex1 */
pthread_mutex_init(&mutex1, NULL);
/* create threads */
pthread_create(&tid_A, NULL, &functionA ,NULL);
pthread_create(&tid_B, NULL, &functionB ,NULL);
pthread_create(&tid_C, NULL, &functionC ,NULL);
// pthread_create(&tid_D, NULL, &functionD ,NULL);
/* some secs sleep */
sleep(3);
}
void *functionA(){
int i=0;
uint64_t cps, cycle1, cycle2, ncycles;
double sec =0.0;
cycle1 = ClockCycles( );
for(i=0;i<1000000;i++){
pthread_mutex_lock(&mutex1);
pthread_mutex_unlock(&mutex1);
}
cycle2 = ClockCycles( );
ncycles = cycle2 - cycle1;
printf("thread-A: %lld cycles elapsed.\n", ncycles);
cps = SYSPAGE_ENTRY(qtime)->cycles_per_sec;
printf( "This system has %lld cycles/sec.\n",cps );
sec = (double)ncycles/cps;
printf("The cycles in seconds is %f.\n\n",sec);
}
void *functionB(),void *functionC(),void *functionD() have the same body as functionA.
-----------------------------------------------------------------------------------------
First of all is my approach correct ?
I have been confused because i learned that only an application profiler gives real results.
thanks again
|
|
|
Hans-Peter Reichert
|
AW: AW: AW: Application Profiler
|
Hans-Peter Reichert
03/02/2009 2:28 PM
post23249
|
AW: AW: AW: Application Profiler
it's all about the resolution you need/want
and the overhead you're adding by modifing your code through adding timing features.
and when you're doing your own accounting you have to know which time you're measuring, CPU or REAL and that your code
is not preempted or at least you know through priorities who has preempted you for how long.
We've got a saying here (it's german): Wer mißt, mißt Mist.
maybe you're able to read it - the spelling is important cause it's the essence
Or in a physics way: as soon as you're observing a system, you're changing it's behaviour.
@Elena: which clock type is application profiler using?
is it ClockCycles or something else, and how big is overhead of __cyg_profile_func_enter and __cyg_profile_func_exit?
/hp
-----Ursprüngliche Nachricht-----
Von: Andreas Polychronopoulos [mailto:community-noreply@qnx.com]
Gesendet: So 01.03.2009 20:28
An: ostech-core_os
Cc:
Betreff: Re: AW: AW: Application Profiler
> timing of sync objects is something you should do with a high resolution time,
>
> and you should do it propably not only once but in a loop.
> so in pseudo code:
> raise prio to max so nobody can preempt
> get ticks per second
> get Start Ticks
> loop 10.0000
> pthread_mutex_unlock()
> done
> get End Ticks
> single time = (end - start) / ticks per sec / 10.000
>
> check this: http://www.qnx.com/developers/docs/6.4.0/neutrino/lib_ref/s/
> syspage_entry.html
> for an example
>
> ClockCycles() is the highest resolution you can get on a QNX system,
> and it is quite cheap, it's not a kernel call, but only an access to the
> mapped system page.
>
> This method is convenient to do single threaded stuff,
> on multi threaded stuff you either have to do some maths on your own
> or you try to use each threads CPUCLOCK.
> you do that this way:
> clock = ClockId(pid, tid)
> Clocktime(clock, ....)
> do your stuff
> ClockTime(clock, ...)
> this method has the disadvantage that the resolution depends on the accounting
> that is done in the scheduler, which is not 100% accurate
>
> @QNX: what about releasing the OS benchmark suite on foundry?
>
Hi again,
It is my first time i am doing benchmarking on synchronization objects. So I followed the benchmark methodology of QNX
at: http://www.qnx.com/download/download/8829/QNX_NeutrinoRTOS_Kernel_Benchmark_Methodology.pdf
So i want to calculate the average cost of a contested and uncontested mutex. My code after your help is the following:
/* uncontested mutex */
uint64_t cps, cycle1, cycle2, ncycles;
double sec =0.0;
double micro_sec = 0.0;
pthread_mutex_t mutex;
int i=0;
int main(int argc, char *argv[]) {
pthread_mutex_init(&mutex,NULL);
cycle1 = ClockCycles( );
for(i=0;i<1000000;i++){
pthread_mutex_lock(&mutex);
pthread_mutex_unlock(&mutex);
}
cycle2 = ClockCycles( );
ncycles = cycle2 - cycle1;
printf("%lld cycles elapsed.\n", ncycles);
cps = SYSPAGE_ENTRY(qtime)->cycles_per_sec;
printf( "This system has %lld cycles/sec.\n",cps );
sec = (double)ncycles/cps;
printf("The cycles in seconds is %f.\n",sec);
return EXIT_SUCCESS;
}
------------------------------------------------------------------
/* contested mutex */
pthread_mutex_t mutex1;
pthread_t tid_A,tid_B,tid_C,tid_D;
/* functions executed by threads */
void *functionA();
void *functionB();
void *functionC();
void *functionD();
int main(int argc, char *argv[]) {
/* initialize mutex1 */
pthread_mutex_init(&mutex1, NULL);
/* create threads */
pthread_create(&tid_A, NULL, &functionA ,NULL);
pthread_create(&tid_B, NULL, &functionB ,NULL);
pthread_create(&tid_C, NULL, &functionC ,NULL);
// pthread_create(&tid_D, NULL, &functionD ,NULL);
/* some secs...
View Full Message
|
|
|
Elena Laskavaia
|
Re: AW: AW: AW: Application Profiler
|
Elena Laskavaia
03/02/2009 3:36 PM
post23262
|
Re: AW: AW: AW: Application Profiler
Right now it uses ClockCycles(). I did not run specific test for
overhead, but it would be maybe 10-12 instructions. If you saving trace
into file there is more overhead on dumping buffer, but it is
automatically removed.
If you wrap whole cycle into function it would be neglectable, i..e
my_wrapper() {
for (i=0;i<1000000;i++) {
system_function();
};
}
And yes it won't account for context switches right now.
Hans-Peter Reichert wrote:
>
> it's all about the resolution you need/want
> and the overhead you're adding by modifing your code through adding
> timing features.
> and when you're doing your own accounting you have to know which time
> you're measuring, CPU or REAL and that your code is not preempted or
> at least you know through priorities who has preempted you for how long.
> We've got a saying here (it's german): Wer mißt, mißt Mist.
> maybe you're able to read it - the spelling is important cause it's
> the essence
> Or in a physics way: as soon as you're observing a system, you're
> changing it's behaviour.
>
> @Elena: which clock type is application profiler using?
> is it ClockCycles or something else, and how big is overhead of
> __cyg_profile_func_enter and __cyg_profile_func_exit?
>
> /hp
>
>
>
>
> -----Ursprüngliche Nachricht-----
> Von: Andreas Polychronopoulos [mailto:community-noreply@qnx.com]
> Gesendet: So 01.03.2009 20:28
> An: ostech-core_os
> Cc:
> Betreff: Re: AW: AW: Application Profiler
>
> > timing of sync objects is something you should do with a high
> resolution time,
> >
> > and you should do it propably not only once but in a loop.
> > so in pseudo code:
> > raise prio to max so nobody can preempt
> > get ticks per second
> > get Start Ticks
> > loop 10.0000
> > pthread_mutex_unlock()
> > done
> > get End Ticks
> > single time = (end - start) / ticks per sec / 10.000
> >
> > check this: http://www.qnx.com/developers/docs/6.4.0/neutrino/lib_ref/s/
> > syspage_entry.html
> > for an example
> >
> > ClockCycles() is the highest resolution you can get on a QNX system,
> > and it is quite cheap, it's not a kernel call, but only an access to the
> > mapped system page.
> >
> > This method is convenient to do single threaded stuff,
> > on multi threaded stuff you either have to do some maths on your own
> > or you try to use each threads CPUCLOCK.
> > you do that this way:
> > clock = ClockId(pid, tid)
> > Clocktime(clock, ....)
> > do your stuff
> > ClockTime(clock, ...)
> > this method has the disadvantage that the resolution depends on the
> accounting
> > that is done in the scheduler, which is not 100% accurate
> >
> > @QNX: what about releasing the OS benchmark suite on foundry?
> >
>
> Hi again,
>
> It is my first time i am doing benchmarking on synchronization
> objects. So I followed the benchmark methodology of QNX at:
> http://www.qnx.com/download/download/8829/QNX_NeutrinoRTOS_Kernel_Benchmark_Methodology.pdf
> So i want to calculate the average cost of a contested and uncontested
> mutex. My code after your help is the following:
>
> /* uncontested mutex */
>
> uint64_t cps, cycle1, cycle2, ncycles;
> double sec =0.0;
> double micro_sec = 0.0;
> pthread_mutex_t mutex;
> int i=0;
>
> int main(int argc, char *argv[]) {
>
> pthread_mutex_init(&mutex,NULL);
> cycle1 = ClockCycles( );
>
> for(i=0;i<1000000;i++){
> pthread_mutex_lock(&mutex);
> pthread_mutex_unlock(&mutex);
> }
> ...
View Full Message
|
|
|
Elena Laskavaia
03/02/2009 9:21 AM
post23185
|
If you want to use application profiler for system functions, you can
wrap them with your own functions, in this case it measure their time +
overhead of calling a function.
Andreas Polychronopoulos wrote:
> Hi everybody,
>
> I am trying to measure the time a mutex take to lock and unlock , here is my code
>
> *****************************************************
> void *functionA();
>
> int main(int argc, char *argv[]) {
>
> /* initialize mutex1 */
> pthread_mutex_init(&mutex1, NULL);
>
> /* create threads */
> pthread_create(&tid_A, NULL, &functionA ,NULL);
>
> /* 2 secs sleep */
> sleep(2);
> }
>
> void *functionA(){
> pthread_mutex_lock(&mutex1);
> pthread_mutex_unlock(&mutex1);
> }
> *******************************************************
>
> I build the application with the IDE and at Build Options i choosed , build for profiling (Function Instrumentation)
>
> Then i create a new run configuration for the _g binary adding the appropriate Application Profiler tool on Tools bar
of the Open Run dialog.
>
> I can get results for the function main and functionA but not for pthread_mutex_lock,pthread_mutex_unlock. Is it
possible to get the time for each line of code ?
> The Functon Instrumentation mode is good for single thread application what about add more threads on the above code ?
Will i get correct results or should i do a different profile ?
>
> thanks
>
>
> _______________________________________________
> OSTech
> http://community.qnx.com/sf/go/post23159
>
>
|
|
|
Andreas Polychronopoulos
03/10/2009 5:05 AM
post23997
|
> If you want to use application profiler for system functions, you can
> wrap them with your own functions, in this case it measure their time +
> overhead of calling a function.
>
Hi again,
I used App Profiler for a single thread application in Function Instrumentation mode. What about with a multi-threaded
app ? The Function Instrumentation mode will add an overhead and maybe change application's beahavior. If i use the
Kernel Logging should i take REALTIME or CPUTIME measurements ? OR should i go to an approach as Hans-Peter Reichert
said above , using the ClockId etc. I would like to measure the REALTIME of my app but no problem if i can get the
CPUTIME instead.
thanks
|
|
|
Elena Laskavaia
03/10/2009 9:19 AM
post24009
|
Both mode in App Profiler use realtime measured in ClockCycles. You cannot modify it. I am not sure what exactly you are
asking.
You can use it for multi-threaded too, but if thread context happened during function run - it won't be correct (i.e. it
would show absolute time
for this function even thread was not running).
Andreas Polychronopoulos wrote:
>> If you want to use application profiler for system functions, you can
>> wrap them with your own functions, in this case it measure their time +
>> overhead of calling a function.
>>
>
> Hi again,
>
> I used App Profiler for a single thread application in Function Instrumentation mode. What about with a multi-threaded
app ? The Function Instrumentation mode will add an overhead and maybe change application's beahavior. If i use the
Kernel Logging should i take REALTIME or CPUTIME measurements ? OR should i go to an approach as Hans-Peter Reichert
said above , using the ClockId etc. I would like to measure the REALTIME of my app but no problem if i can get the
CPUTIME instead.
>
> thanks
>
> _______________________________________________
> OSTech
> http://community.qnx.com/sf/go/post23997
>
|
|
|
|