Project Home
Project Home
Documents
Documents
Wiki
Wiki
Discussion Forums
Discussions
Project Information
Project Info
Forum Topic - tracelogger: extraordinary overhead in one thread: (7 Items)
   
tracelogger: extraordinary overhead in one thread  
running tracelogger while copying data in ram shows big impact on cputime of copy process:

intel:/dev/shmem> time cp 300mbyte.bin 300mbyte_cp.bin 
    2.82s real     0.09s user     0.11s system
    0.156 cputime

intel:/share> tracelogger -M -S200M -n0 -w &

intel:/dev/shmem> time cp 300mbyte.bin 300mbyte_cp.bin 
    3.90s real     0.07s user     0.24s system
    0.350 cputime

system environment:
QNX6.5, x86 / armv7

0.156 -> 0.350 msec (+124%)

cputime was calculated with ClockId(pid, tid) and ClockTime(clockId,...) calls.

For other processes in the system the cputime keeps almost the same (with/without tracelogger).

Has anyone an idea why tracelogger causes this increase of cpu time?

First I supposed an influence by the number of traceevents of the process. But a test binary creating a huge number of 
trace events does not show the cpu time behavior.

Could it be a memory or caching issue?
RE: tracelogger: extraordinary overhead in one thread  
Just to get the semantic right, you are not writing to ram, you are writing in /dev/shmem/. This device is handled by 
procnto, and so is the trace stuff.  My own observation with /dev/shmem indicates that is will block many memory related
 operation, so it's not a big leap to assume that /dev/shmem affects tracelogger.

-----Message d'origine-----
De : Thomas Schickentanz [mailto:community-noreply@qnx.com] 
Envoyé : Monday, July 07, 2014 11:43 AM
À : ostech-core_os
Objet : tracelogger: extraordinary overhead in one thread

running tracelogger while copying data in ram shows big impact on cputime of copy process:

intel:/dev/shmem> time cp 300mbyte.bin 300mbyte_cp.bin 
    2.82s real     0.09s user     0.11s system
    0.156 cputime

intel:/share> tracelogger -M -S200M -n0 -w &

intel:/dev/shmem> time cp 300mbyte.bin 300mbyte_cp.bin 
    3.90s real     0.07s user     0.24s system
    0.350 cputime

system environment:
QNX6.5, x86 / armv7

0.156 -> 0.350 msec (+124%)

cputime was calculated with ClockId(pid, tid) and ClockTime(clockId,...) calls.

For other processes in the system the cputime keeps almost the same (with/without tracelogger).

Has anyone an idea why tracelogger causes this increase of cpu time?

First I supposed an influence by the number of traceevents of the process. But a test binary creating a huge number of 
trace events does not show the cpu time behavior.

Could it be a memory or caching issue?




_______________________________________________

OSTech
http://community.qnx.com/sf/go/post110954
To cancel your subscription to this discussion, please e-mail ostech-core_os-unsubscribe@community.qnx.com

RE: tracelogger: extraordinary overhead in one thread  
I should also add that every write to /dev/shmem, will cause some entry in the trace buffer.  Cp use 16k buffer I 
believe so for every 16K you a trace entry, and that's the write operation only, you get context switches to go along 
with that.   if the file is 300M, that quite a few entries done at a very fast rate.

-----Message d'origine-----
De : Thomas Schickentanz [mailto:community-noreply@qnx.com] 
Envoyé : Monday, July 07, 2014 11:43 AM
À : ostech-core_os
Objet : tracelogger: extraordinary overhead in one thread

running tracelogger while copying data in ram shows big impact on cputime of copy process:

intel:/dev/shmem> time cp 300mbyte.bin 300mbyte_cp.bin 
    2.82s real     0.09s user     0.11s system
    0.156 cputime

intel:/share> tracelogger -M -S200M -n0 -w &

intel:/dev/shmem> time cp 300mbyte.bin 300mbyte_cp.bin 
    3.90s real     0.07s user     0.24s system
    0.350 cputime

system environment:
QNX6.5, x86 / armv7

0.156 -> 0.350 msec (+124%)

cputime was calculated with ClockId(pid, tid) and ClockTime(clockId,...) calls.

For other processes in the system the cputime keeps almost the same (with/without tracelogger).

Has anyone an idea why tracelogger causes this increase of cpu time?

First I supposed an influence by the number of traceevents of the process. But a test binary creating a huge number of 
trace events does not show the cpu time behavior.

Could it be a memory or caching issue?




_______________________________________________

OSTech
http://community.qnx.com/sf/go/post110954
To cancel your subscription to this discussion, please e-mail ostech-core_os-unsubscribe@community.qnx.com

Re: tracelogger: extraordinary overhead in one thread  
Hi Thomas,

This is an interesting case - cp is basically a frantic burst of 16k 
messages.  I imagine that it is proportionally
more impacted by tracing than others because we are peeking into the 
message, and because of the nature of
how /dev/shmem is implemented.

It would be interesting to see how it is affected by copying to 
/dev/null instead - the same impact?

BTW I would look at real time rather than cpu time - remember that 
cputime is tick based, which can be a little
misleading at times.

On 2014-07-07 11:42 AM, Thomas Schickentanz wrote:
> running tracelogger while copying data in ram shows big impact on cputime of copy process:
>
> intel:/dev/shmem> time cp 300mbyte.bin 300mbyte_cp.bin
>      2.82s real     0.09s user     0.11s system
>      0.156 cputime
>
> intel:/share> tracelogger -M -S200M -n0 -w &
>
> intel:/dev/shmem> time cp 300mbyte.bin 300mbyte_cp.bin
>      3.90s real     0.07s user     0.24s system
>      0.350 cputime
>
> system environment:
> QNX6.5, x86 / armv7
>
> 0.156 -> 0.350 msec (+124%)
>
> cputime was calculated with ClockId(pid, tid) and ClockTime(clockId,...) calls.
>
> For other processes in the system the cputime keeps almost the same (with/without tracelogger).
>
> Has anyone an idea why tracelogger causes this increase of cpu time?
>
> First I supposed an influence by the number of traceevents of the process. But a test binary creating a huge number of
 trace events does not show the cpu time behavior.
>
> Could it be a memory or caching issue?
>
>
>
>
> _______________________________________________
>
> OSTech
> http://community.qnx.com/sf/go/post110954
> To cancel your subscription to this discussion, please e-mail ostech-core_os-unsubscribe@community.qnx.com

Re: tracelogger: extraordinary overhead in one thread  
Hi Colin,

I agree that accounting cputime with ClockTime() is not best. cp thread has average running duration of about 9 µsec 
and will not be accounted well with 1 msec resolution.
For measurement of usecase "copy to /dev/null without kerneltrace" I tried tr_lite with GPTimer of hardware for 
accounting. Because tr_lite registers dynamic filtering callback at kernel site for thread state change events, it has 
also an impact on the usecase.

without tracing:
intel:/dev/shmem> time cp 300mbyte.bin /dev/null
    1.34s real     0.06s user     0.08s system

with tr_lite:
intel:/dev/shmem> tr-lite &

intel:/dev/shmem> time cp ram300mbyte.bin /dev/null 
    1.77s real     0.05s user     0.14s system
    tr_lite: 230 msec cputime for cp

with tracelogger:
intel:/tmp> tracelogger -M -S300M -n0 -w &

intel:/dev/shmem> time cp ram300mbyte.bin /dev/null 
    2.38s real     0.09s user     0.17s system      
    kerneltrace: 325 msec cputime for cp

As a result I can see that the real time decreases (compared to shmem -> shmem copy) by 1.5 seconds (because write to /
dev/null is faster, see below), but the difference between the two usecases keeps the same (1.34 -> 2.38 and 2.82 -> 3.
9).
Cputimes by tr_lite and kerneltrace should be close to reality. So we can see a differnce of 41% compared with tr_lite 
which has itself an impact on cputime.

Some interesting numbers from kerneltrace (client + server time):
			
client 'cp' -> server	bytes    	        duration	        cputime
'procnto-smp-instr'	629145600 	2292.33		1743.47
   eREAD	                314572800 	1663.94		1371.10
      ram300mbyte.bin	314572800	        1663.94		1371.10
   eWRITE	                314572800 	 628.38	 	 372.38
      /dev/null	        314572800 	 628.38	 	 372.38

I think the kernelevents for scheduling cp and procnto have a proportional impact on the cputime.

number of state switches to 'RUNNING' in cp usecase intervall (2389 msec): 
procnto:  42809 (without idle threads)
cp:         38489
other   :    6414
total:       87712  

Kerneltrace and tr_lite both log these state switch events. The difference in times could be explained by the fact that 
kerneltrace has to log additional events and must write the events to ram.

My summary:
Using kerneltracing, everyone should be aware that cpu overhead is added to the threads proportionally by kernelevents 
of the threads.

By the way: tracelogger (with IDE) is the most exciting analysis tool I ever worked with!  
Re: tracelogger: extraordinary overhead in one thread  
Thanks for your (as always) thoughtful and interesting analysis, Thomas!

Might I suggest one last test?  Let's see how this affect plays out 
running in uniprocessor mode.

Best regards,

Colin

On 2014-07-18 9:42 AM, Thomas Schickentanz wrote:
> Hi Colin,
>
> I agree that accounting cputime with ClockTime() is not best. cp thread has average running duration of about 9 µsec 
and will not be accounted well with 1 msec resolution.
> For measurement of usecase "copy to /dev/null without kerneltrace" I tried tr_lite with GPTimer of hardware for 
accounting. Because tr_lite registers dynamic filtering callback at kernel site for thread state change events, it has 
also an impact on the usecase.
>
> without tracing:
> intel:/dev/shmem> time cp 300mbyte.bin /dev/null
>      1.34s real     0.06s user     0.08s system
>
> with tr_lite:
> intel:/dev/shmem> tr-lite &
>
> intel:/dev/shmem> time cp ram300mbyte.bin /dev/null
>      1.77s real     0.05s user     0.14s system
>      tr_lite: 230 msec cputime for cp
>
> with tracelogger:
> intel:/tmp> tracelogger -M -S300M -n0 -w &
>
> intel:/dev/shmem> time cp ram300mbyte.bin /dev/null
>      2.38s real     0.09s user     0.17s system
>      kerneltrace: 325 msec cputime for cp
>
> As a result I can see that the real time decreases (compared to shmem -> shmem copy) by 1.5 seconds (because write to 
/dev/null is faster, see below), but the difference between the two usecases keeps the same (1.34 -> 2.38 and 2.82 -> 3.
9).
> Cputimes by tr_lite and kerneltrace should be close to reality. So we can see a differnce of 41% compared with tr_lite
 which has itself an impact on cputime.
>
> Some interesting numbers from kerneltrace (client + server time):
> 			
> client 'cp' -> server	bytes    	        duration	        cputime
> 'procnto-smp-instr'	629145600 	2292.33		1743.47
>     eREAD	                314572800 	1663.94		1371.10
>        ram300mbyte.bin	314572800	        1663.94		1371.10
>     eWRITE	                314572800 	 628.38	 	 372.38
>        /dev/null	        314572800 	 628.38	 	 372.38
>
> I think the kernelevents for scheduling cp and procnto have a proportional impact on the cputime.
>
> number of state switches to 'RUNNING' in cp usecase intervall (2389 msec):
> procnto:  42809 (without idle threads)
> cp:         38489
> other   :    6414
> total:       87712
>
> Kerneltrace and tr_lite both log these state switch events. The difference in times could be explained by the fact 
that kerneltrace has to log additional events and must write the events to ram.
>
> My summary:
> Using kerneltracing, everyone should be aware that cpu overhead is added to the threads proportionally by kernelevents
 of the threads.
>
> By the way: tracelogger (with IDE) is the most exciting analysis tool I ever worked with!
>
>
>
>
> _______________________________________________
>
> OSTech
> http://community.qnx.com/sf/go/post111105
> To cancel your subscription to this discussion, please e-mail ostech-core_os-unsubscribe@community.qnx.com

Re: tracelogger: extraordinary overhead in one thread  
I set the affinity for all processes/threads to the first core (except idle thread for core two ;-)) with "slay -i -C0 .
..". 
After starting tracelogger and cp with "on -C0 ..." the result was the same for real time and cpu time.