Project Home
Project Home
Documents
Documents
Wiki
Wiki
Discussion Forums
Discussions
Project Information
Project Info
Forum Topic - Application Profiler : Page 1 of 2 (15 Items)
   
Application Profiler  
Hi everybody,

I am trying to measure the time a mutex take to lock and unlock , here is my code

*****************************************************
void *functionA();

int main(int argc, char *argv[]) {
	
	/* initialize mutex1 */
	pthread_mutex_init(&mutex1, NULL);
	
	/* create threads */
	pthread_create(&tid_A, NULL, &functionA ,NULL);
	
	/* 2 secs sleep */
	sleep(2);
}

void *functionA(){
	pthread_mutex_lock(&mutex1);
	pthread_mutex_unlock(&mutex1);
}
*******************************************************

I build the application with the IDE and at Build Options i choosed , build for profiling (Function Instrumentation)

Then i create a new run configuration for the _g binary adding the appropriate Application Profiler tool on Tools bar of
 the Open Run dialog.

I can get results for the function main and functionA but not for pthread_mutex_lock,pthread_mutex_unlock. Is it 
possible to get the time for each line of code ?
The Functon Instrumentation mode is good for single thread application what about add more threads on the above code ? 
Will i get correct results or should i do a different profile ?

thanks
AW: Application Profiler  
let me start from behind:
Application Profiler is yet not capable of detecting thread preemtion,
so what you will get when profiling a multi threaded app is always REALTIME not CPUTIME.

With function instrumentation you're able to measure every function that is compiled with this option.
As pthread_*() is all part of libc.so you can't get numbers for this functions unless you use an instrumented version of
 libc, which you will have to build on your own.

if you're planning to do benchmarks on some OS objects, this is probably not the right approach - you should use 
different time functions for this.

HTH
/hp


-----Ursprüngliche Nachricht-----
Von:	Andreas Polychronopoulos [mailto:community-noreply@qnx.com]
Gesendet:	Sa 28.02.2009 14:30
An:	ostech-core_os
Cc:	
Betreff:	Application Profiler 

Hi everybody,

I am trying to measure the time a mutex take to lock and unlock , here is my code

*****************************************************
void *functionA();

int main(int argc, char *argv[]) {
	
	/* initialize mutex1 */
	pthread_mutex_init(&mutex1, NULL);
	
	/* create threads */
	pthread_create(&tid_A, NULL, &functionA ,NULL);
	
	/* 2 secs sleep */
	sleep(2);
}

void *functionA(){
	pthread_mutex_lock(&mutex1);
	pthread_mutex_unlock(&mutex1);
}
*******************************************************

I build the application with the IDE and at Build Options i choosed , build for profiling (Function Instrumentation)

Then i create a new run configuration for the _g binary adding the appropriate Application Profiler tool on Tools bar of
 the Open Run dialog.

I can get results for the function main and functionA but not for pthread_mutex_lock,pthread_mutex_unlock. Is it 
possible to get the time for each line of code ?
The Functon Instrumentation mode is good for single thread application what about add more threads on the above code ? 
Will i get correct results or should i do a different profile ?

thanks


_______________________________________________
OSTech
http://community.qnx.com/sf/go/post23159 
 
*******************************************
Harman Becker Automotive Systems GmbH
Management Board: Dr. Klaus Blickle (Chairman), Dr. Udo Hüls, Michael Mauser
Chairman of the Supervisory Board: Ansgar Rempp | Domicile: Karlsbad | 
Local Court Mannheim: Register No. 361395

 
*******************************************
Diese E-Mail enthaelt vertrauliche und/oder rechtlich geschuetzte Informationen. Wenn Sie nicht der richtige Adressat 
sind oder diese E-Mail irrtuemlich erhalten haben, informieren Sie bitte sofort den Absender und loeschen Sie diese Mail
. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet.
This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have 
received this e-mail in error) please notify the sender immediately and delete this e-mail. Any unauthorized copying, 
disclosure or distribution of the contents in this e-mail is strictly forbidden.
*******************************************
Attachment: Text winmail.dat 4.07 KB
Re: AW: Application Profiler  
 Thanks for the reply

> if you're planning to do benchmarks on some OS objects, this is probably not 
> the right approach - you should use different time functions for this.

Yes i want to do benchmarks on synchronization objects like mutex. Do you mean that i should use some functions like 
gettimeofday() ? I used gettimeofday() but I learned that this is not a good approach so I turned to Application 
Profiler solution. Any advice would be helpful

thanks
AW: AW: Application Profiler  
timing of sync objects is something you should do with a high resolution time,
and you should do it propably not only once but in a loop.
so in pseudo code:
   raise prio to max so nobody can preempt
   get ticks per second
   get Start Ticks
   loop 10.0000
      pthread_mutex_unlock()
   done
  get End Ticks
  single time = (end - start) / ticks per sec / 10.000

check this: http://www.qnx.com/developers/docs/6.4.0/neutrino/lib_ref/s/syspage_entry.html
for an example

ClockCycles() is the highest resolution you can get on a QNX system,
and it is quite cheap, it's not a kernel call, but only an access to the mapped system page.

This method is convenient to do single threaded stuff,
on multi threaded stuff you either have to do some maths on your own
or you try to use each threads CPUCLOCK.
you do that this way:
   clock = ClockId(pid, tid)
   Clocktime(clock, ....)
      do your stuff
   ClockTime(clock, ...)
this method has the disadvantage that the resolution depends on the accounting that is done in the scheduler, which is 
not 100% accurate 

@QNX: what about releasing the OS benchmark suite on foundry?

/hp



-----Ursprüngliche Nachricht-----
Von:	Andreas Polychronopoulos [mailto:community-noreply@qnx.com]
Gesendet:	Sa 28.02.2009 20:37
An:	ostech-core_os
Cc:	
Betreff:	Re: AW: Application Profiler 

 Thanks for the reply

> if you're planning to do benchmarks on some OS objects, this is probably not 
> the right approach - you should use different time functions for this.

Yes i want to do benchmarks on synchronization objects like mutex. Do you mean that i should use some functions like 
gettimeofday() ? I used gettimeofday() but I learned that this is not a good approach so I turned to Application 
Profiler solution. Any advice would be helpful

thanks

_______________________________________________
OSTech
http://community.qnx.com/sf/go/post23162 
 
*******************************************
Harman Becker Automotive Systems GmbH
Management Board: Dr. Klaus Blickle (Chairman), Dr. Udo Hüls, Michael Mauser
Chairman of the Supervisory Board: Ansgar Rempp | Domicile: Karlsbad | 
Local Court Mannheim: Register No. 361395

 
*******************************************
Diese E-Mail enthaelt vertrauliche und/oder rechtlich geschuetzte Informationen. Wenn Sie nicht der richtige Adressat 
sind oder diese E-Mail irrtuemlich erhalten haben, informieren Sie bitte sofort den Absender und loeschen Sie diese Mail
. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet.
This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have 
received this e-mail in error) please notify the sender immediately and delete this e-mail. Any unauthorized copying, 
disclosure or distribution of the contents in this e-mail is strictly forbidden.
*******************************************
Attachment: Text winmail.dat 4.14 KB
Re: AW: AW: Application Profiler  
> timing of sync objects is something you should do with a high resolution time,
> 
> and you should do it propably not only once but in a loop.
> so in pseudo code:
>    raise prio to max so nobody can preempt
>    get ticks per second
>    get Start Ticks
>    loop 10.0000
>       pthread_mutex_unlock()
>    done
>   get End Ticks
>   single time = (end - start) / ticks per sec / 10.000
> 
> check this: http://www.qnx.com/developers/docs/6.4.0/neutrino/lib_ref/s/
> syspage_entry.html
> for an example
> 
> ClockCycles() is the highest resolution you can get on a QNX system,
> and it is quite cheap, it's not a kernel call, but only an access to the 
> mapped system page.
> 
> This method is convenient to do single threaded stuff,
> on multi threaded stuff you either have to do some maths on your own
> or you try to use each threads CPUCLOCK.
> you do that this way:
>    clock = ClockId(pid, tid)
>    Clocktime(clock, ....)
>       do your stuff
>    ClockTime(clock, ...)
> this method has the disadvantage that the resolution depends on the accounting
>  that is done in the scheduler, which is not 100% accurate 
> 
> @QNX: what about releasing the OS benchmark suite on foundry?
> 

Hi again,

It is my first time i am doing benchmarking on synchronization objects. So I followed the benchmark methodology of QNX 
at: http://www.qnx.com/download/download/8829/QNX_NeutrinoRTOS_Kernel_Benchmark_Methodology.pdf
So i want to calculate the average cost of a contested and uncontested mutex. My code after your help is the following:

/* uncontested mutex */

uint64_t cps, cycle1, cycle2, ncycles;
double sec =0.0;
double micro_sec = 0.0;
pthread_mutex_t mutex;
int i=0;

int main(int argc, char *argv[]) {
	
   pthread_mutex_init(&mutex,NULL);
   cycle1 = ClockCycles( );
   
   for(i=0;i<1000000;i++){
	   pthread_mutex_lock(&mutex);
	   pthread_mutex_unlock(&mutex);
   }
  
   cycle2 = ClockCycles( );
   ncycles = cycle2 - cycle1;
   printf("%lld cycles elapsed.\n", ncycles);
   
   cps = SYSPAGE_ENTRY(qtime)->cycles_per_sec;
   printf( "This system has %lld cycles/sec.\n",cps );
   sec = (double)ncycles/cps;
   printf("The cycles in seconds is %f.\n",sec);
   
   return EXIT_SUCCESS;
       
}
------------------------------------------------------------------

/* contested mutex */

pthread_mutex_t mutex1;
pthread_t tid_A,tid_B,tid_C,tid_D;

/* functions executed by threads */
void *functionA();
void *functionB();
void *functionC();
void *functionD();


int main(int argc, char *argv[]) {
	
	/* initialize mutex1 */
	pthread_mutex_init(&mutex1, NULL);
	
	/* create threads */
	pthread_create(&tid_A, NULL, &functionA ,NULL);
	pthread_create(&tid_B, NULL, &functionB ,NULL);
	pthread_create(&tid_C, NULL, &functionC ,NULL);
//	pthread_create(&tid_D, NULL, &functionD ,NULL);
	
	/* some secs sleep */
	sleep(3);
}

void *functionA(){
	int i=0;
	uint64_t cps, cycle1, cycle2, ncycles;
	double sec =0.0;
	
	cycle1 = ClockCycles( );
	
	for(i=0;i<1000000;i++){
		pthread_mutex_lock(&mutex1);
		pthread_mutex_unlock(&mutex1);
        }

        cycle2 = ClockCycles( );
	ncycles = cycle2 - cycle1;
	printf("thread-A: %lld cycles elapsed.\n", ncycles);
	
        cps = SYSPAGE_ENTRY(qtime)->cycles_per_sec;
        printf( "This system has %lld cycles/sec.\n",cps );
        sec = (double)ncycles/cps;
	printf("The cycles in seconds is %f.\n\n",sec);
}

void *functionB(),void *functionC(),void *functionD() have the same body as functionA.
-----------------------------------------------------------------------------------------

First of all is my approach correct ?
I have been confused because i learned that only an application profiler gives real results.

thanks again

AW: AW: AW: Application Profiler  
it's all about the resolution you need/want
and the overhead you're adding by modifing your code through adding timing features.
and when you're doing your own accounting you have to know which time you're measuring, CPU or REAL and that your code 
is not preempted or at least you know through priorities who has preempted you for how long.
We've got a saying here (it's german): Wer mißt, mißt Mist.
maybe you're able to read it - the spelling is important cause it's the essence
Or in a physics way: as soon as you're observing a system, you're changing it's behaviour.

@Elena: which clock type is application profiler using?
is it ClockCycles or something else, and how big is overhead of __cyg_profile_func_enter and __cyg_profile_func_exit?

/hp




-----Ursprüngliche Nachricht-----
Von:	Andreas Polychronopoulos [mailto:community-noreply@qnx.com]
Gesendet:	So 01.03.2009 20:28
An:	ostech-core_os
Cc:	
Betreff:	Re: AW: AW: Application Profiler 

> timing of sync objects is something you should do with a high resolution time,
> 
> and you should do it propably not only once but in a loop.
> so in pseudo code:
>    raise prio to max so nobody can preempt
>    get ticks per second
>    get Start Ticks
>    loop 10.0000
>       pthread_mutex_unlock()
>    done
>   get End Ticks
>   single time = (end - start) / ticks per sec / 10.000
> 
> check this: http://www.qnx.com/developers/docs/6.4.0/neutrino/lib_ref/s/
> syspage_entry.html
> for an example
> 
> ClockCycles() is the highest resolution you can get on a QNX system,
> and it is quite cheap, it's not a kernel call, but only an access to the 
> mapped system page.
> 
> This method is convenient to do single threaded stuff,
> on multi threaded stuff you either have to do some maths on your own
> or you try to use each threads CPUCLOCK.
> you do that this way:
>    clock = ClockId(pid, tid)
>    Clocktime(clock, ....)
>       do your stuff
>    ClockTime(clock, ...)
> this method has the disadvantage that the resolution depends on the accounting
>  that is done in the scheduler, which is not 100% accurate 
> 
> @QNX: what about releasing the OS benchmark suite on foundry?
> 

Hi again,

It is my first time i am doing benchmarking on synchronization objects. So I followed the benchmark methodology of QNX 
at: http://www.qnx.com/download/download/8829/QNX_NeutrinoRTOS_Kernel_Benchmark_Methodology.pdf
So i want to calculate the average cost of a contested and uncontested mutex. My code after your help is the following:

/* uncontested mutex */

uint64_t cps, cycle1, cycle2, ncycles;
double sec =0.0;
double micro_sec = 0.0;
pthread_mutex_t mutex;
int i=0;

int main(int argc, char *argv[]) {
	
   pthread_mutex_init(&mutex,NULL);
   cycle1 = ClockCycles( );
   
   for(i=0;i<1000000;i++){
	   pthread_mutex_lock(&mutex);
	   pthread_mutex_unlock(&mutex);
   }
  
   cycle2 = ClockCycles( );
   ncycles = cycle2 - cycle1;
   printf("%lld cycles elapsed.\n", ncycles);
   
   cps = SYSPAGE_ENTRY(qtime)->cycles_per_sec;
   printf( "This system has %lld cycles/sec.\n",cps );
   sec = (double)ncycles/cps;
   printf("The cycles in seconds is %f.\n",sec);
   
   return EXIT_SUCCESS;
       
}
------------------------------------------------------------------

/* contested mutex */

pthread_mutex_t mutex1;
pthread_t tid_A,tid_B,tid_C,tid_D;

/* functions executed by threads */
void *functionA();
void *functionB();
void *functionC();
void *functionD();


int main(int argc, char *argv[]) {
	
	/* initialize mutex1 */
	pthread_mutex_init(&mutex1, NULL);
	
	/* create threads */
	pthread_create(&tid_A, NULL, &functionA ,NULL);
	pthread_create(&tid_B, NULL, &functionB ,NULL);
	pthread_create(&tid_C, NULL, &functionC ,NULL);
//	pthread_create(&tid_D, NULL, &functionD ,NULL);
	
	/* some secs...
View Full Message
Attachment: Text winmail.dat 5.46 KB
Re: AW: AW: AW: Application Profiler  
Right now it uses ClockCycles(). I did not run specific test for 
overhead, but it would be maybe 10-12 instructions. If you saving trace 
into file there is more overhead on dumping buffer, but it is
automatically removed.

If you wrap whole cycle into function it would be neglectable, i..e

my_wrapper() {
 for (i=0;i<1000000;i++) {
 system_function();
 };
}

And yes it won't account for context switches right now.

Hans-Peter Reichert wrote:
>
> it's all about the resolution you need/want
> and the overhead you're adding by modifing your code through adding 
> timing features.
> and when you're doing your own accounting you have to know which time 
> you're measuring, CPU or REAL and that your code is not preempted or 
> at least you know through priorities who has preempted you for how long.
> We've got a saying here (it's german): Wer mißt, mißt Mist.
> maybe you're able to read it - the spelling is important cause it's 
> the essence
> Or in a physics way: as soon as you're observing a system, you're 
> changing it's behaviour.
>
> @Elena: which clock type is application profiler using?
> is it ClockCycles or something else, and how big is overhead of 
> __cyg_profile_func_enter and __cyg_profile_func_exit?
>
> /hp
>
>
>
>
> -----Ursprüngliche Nachricht-----
> Von:    Andreas Polychronopoulos [mailto:community-noreply@qnx.com]
> Gesendet:       So 01.03.2009 20:28
> An:     ostech-core_os
> Cc:    
> Betreff:        Re: AW: AW: Application Profiler
>
> > timing of sync objects is something you should do with a high 
> resolution time,
> >
> > and you should do it propably not only once but in a loop.
> > so in pseudo code:
> >    raise prio to max so nobody can preempt
> >    get ticks per second
> >    get Start Ticks
> >    loop 10.0000
> >       pthread_mutex_unlock()
> >    done
> >   get End Ticks
> >   single time = (end - start) / ticks per sec / 10.000
> >
> > check this: http://www.qnx.com/developers/docs/6.4.0/neutrino/lib_ref/s/
> > syspage_entry.html
> > for an example
> >
> > ClockCycles() is the highest resolution you can get on a QNX system,
> > and it is quite cheap, it's not a kernel call, but only an access to the
> > mapped system page.
> >
> > This method is convenient to do single threaded stuff,
> > on multi threaded stuff you either have to do some maths on your own
> > or you try to use each threads CPUCLOCK.
> > you do that this way:
> >    clock = ClockId(pid, tid)
> >    Clocktime(clock, ....)
> >       do your stuff
> >    ClockTime(clock, ...)
> > this method has the disadvantage that the resolution depends on the 
> accounting
> >  that is done in the scheduler, which is not 100% accurate
> >
> > @QNX: what about releasing the OS benchmark suite on foundry?
> >
>
> Hi again,
>
> It is my first time i am doing benchmarking on synchronization 
> objects. So I followed the benchmark methodology of QNX at: 
> http://www.qnx.com/download/download/8829/QNX_NeutrinoRTOS_Kernel_Benchmark_Methodology.pdf
> So i want to calculate the average cost of a contested and uncontested 
> mutex. My code after your help is the following:
>
> /* uncontested mutex */
>
> uint64_t cps, cycle1, cycle2, ncycles;
> double sec =0.0;
> double micro_sec = 0.0;
> pthread_mutex_t mutex;
> int i=0;
>
> int main(int argc, char *argv[]) {
>        
>    pthread_mutex_init(&mutex,NULL);
>    cycle1 = ClockCycles( );
>   
>    for(i=0;i<1000000;i++){
>            pthread_mutex_lock(&mutex);
>            pthread_mutex_unlock(&mutex);
>    }
> ...
View Full Message
Re: Application Profiler  
If you want to use application profiler for system functions, you can 
wrap them with your own functions, in this case it measure their time + 
overhead of calling a function.

Andreas Polychronopoulos wrote:
> Hi everybody,
>
> I am trying to measure the time a mutex take to lock and unlock , here is my code
>
> *****************************************************
> void *functionA();
>
> int main(int argc, char *argv[]) {
> 	
> 	/* initialize mutex1 */
> 	pthread_mutex_init(&mutex1, NULL);
> 	
> 	/* create threads */
> 	pthread_create(&tid_A, NULL, &functionA ,NULL);
> 	
> 	/* 2 secs sleep */
> 	sleep(2);
> }
>
> void *functionA(){
> 	pthread_mutex_lock(&mutex1);
> 	pthread_mutex_unlock(&mutex1);
> }
> *******************************************************
>
> I build the application with the IDE and at Build Options i choosed , build for profiling (Function Instrumentation)
>
> Then i create a new run configuration for the _g binary adding the appropriate Application Profiler tool on Tools bar 
of the Open Run dialog.
>
> I can get results for the function main and functionA but not for pthread_mutex_lock,pthread_mutex_unlock. Is it 
possible to get the time for each line of code ?
> The Functon Instrumentation mode is good for single thread application what about add more threads on the above code ?
 Will i get correct results or should i do a different profile ?
>
> thanks
>
>
> _______________________________________________
> OSTech
> http://community.qnx.com/sf/go/post23159
>
>   
Re: Application Profiler  
> If you want to use application profiler for system functions, you can 
> wrap them with your own functions, in this case it measure their time + 
> overhead of calling a function.
> 

Hi again,

I used App Profiler for a single thread application in Function Instrumentation mode. What about with a multi-threaded 
app ? The Function Instrumentation mode will add an overhead and maybe change application's beahavior. If i use the 
Kernel Logging should i take REALTIME or CPUTIME measurements ? OR should i go to an approach as Hans-Peter Reichert 
said above , using the ClockId etc. I would like to measure the REALTIME of my app but no problem if i can get the 
CPUTIME instead.

thanks 
Re: Application Profiler  
Both mode in App Profiler use realtime measured in ClockCycles. You cannot modify it. I am not sure what exactly you are
 asking.
You can use it for multi-threaded too, but if thread context happened during function run - it won't be correct (i.e. it
 would show absolute time
for this function even thread was not running).

Andreas Polychronopoulos wrote:
>> If you want to use application profiler for system functions, you can 
>> wrap them with your own functions, in this case it measure their time + 
>> overhead of calling a function.
>>
> 
> Hi again,
> 
> I used App Profiler for a single thread application in Function Instrumentation mode. What about with a multi-threaded
 app ? The Function Instrumentation mode will add an overhead and maybe change application's beahavior. If i use the 
Kernel Logging should i take REALTIME or CPUTIME measurements ? OR should i go to an approach as Hans-Peter Reichert 
said above , using the ClockId etc. I would like to measure the REALTIME of my app but no problem if i can get the 
CPUTIME instead.
> 
> thanks 
> 
> _______________________________________________
> OSTech
> http://community.qnx.com/sf/go/post23997
>