foundry27 : Post

Forum Topic - tracelogger: extraordinary overhead in one thread: (7 Items)

View: as

Thomas Schickentanz

07/07/2014 11:42 AM

post110954

tracelogger: extraordinary overhead in one thread

running tracelogger while copying data in ram shows big impact on cputime of copy process:

intel:/dev/shmem> time cp 300mbyte.bin 300mbyte_cp.bin 
    2.82s real     0.09s user     0.11s system
    0.156 cputime

intel:/share> tracelogger -M -S200M -n0 -w &

intel:/dev/shmem> time cp 300mbyte.bin 300mbyte_cp.bin 
    3.90s real     0.07s user     0.24s system
    0.350 cputime

system environment:
QNX6.5, x86 / armv7

0.156 -> 0.350 msec (+124%)

cputime was calculated with ClockId(pid, tid) and ClockTime(clockId,...) calls.

For other processes in the system the cputime keeps almost the same (with/without tracelogger).

Has anyone an idea why tracelogger causes this increase of cpu time?

First I supposed an influence by the number of traceevents of the process. But a test binary creating a huge number of 
trace events does not show the cpu time behavior.

Could it be a memory or caching issue?

Mario Charest

RE: tracelogger: extraordinary overhead in one thread

Mario Charest

07/07/2014 5:03 PM

post110960

RE: tracelogger: extraordinary overhead in one thread

Just to get the semantic right, you are not writing to ram, you are writing in /dev/shmem/. This device is handled by 
procnto, and so is the trace stuff.  My own observation with /dev/shmem indicates that is will block many memory related
 operation, so it's not a big leap to assume that /dev/shmem affects tracelogger.

-----Message d'origine-----
De : Thomas Schickentanz [mailto:community-noreply@qnx.com] 
Envoyé : Monday, July 07, 2014 11:43 AM
À : ostech-core_os
Objet : tracelogger: extraordinary overhead in one thread

running tracelogger while copying data in ram shows big impact on cputime of copy process:

intel:/dev/shmem> time cp 300mbyte.bin 300mbyte_cp.bin 
    2.82s real     0.09s user     0.11s system
    0.156 cputime

intel:/share> tracelogger -M -S200M -n0 -w &

intel:/dev/shmem> time cp 300mbyte.bin 300mbyte_cp.bin 
    3.90s real     0.07s user     0.24s system
    0.350 cputime

system environment:
QNX6.5, x86 / armv7

0.156 -> 0.350 msec (+124%)

cputime was calculated with ClockId(pid, tid) and ClockTime(clockId,...) calls.

For other processes in the system the cputime keeps almost the same (with/without tracelogger).

Has anyone an idea why tracelogger causes this increase of cpu time?

First I supposed an influence by the number of traceevents of the process. But a test binary creating a huge number of 
trace events does not show the cpu time behavior.

Could it be a memory or caching issue?




_______________________________________________

OSTech
http://community.qnx.com/sf/go/post110954
To cancel your subscription to this discussion, please e-mail ostech-core_os-unsubscribe@community.qnx.com

Mario Charest

RE: tracelogger: extraordinary overhead in one thread

Mario Charest

07/07/2014 5:06 PM

post110961

RE: tracelogger: extraordinary overhead in one thread

I should also add that every write to /dev/shmem, will cause some entry in the trace buffer.  Cp use 16k buffer I 
believe so for every 16K you a trace entry, and that's the write operation only, you get context switches to go along 
with that.   if the file is 300M, that quite a few entries done at a very fast rate.

-----Message d'origine-----
De : Thomas Schickentanz [mailto:community-noreply@qnx.com] 
Envoyé : Monday, July 07, 2014 11:43 AM
À : ostech-core_os
Objet : tracelogger: extraordinary overhead in one thread

running tracelogger while copying data in ram shows big impact on cputime of copy process:

intel:/dev/shmem> time cp 300mbyte.bin 300mbyte_cp.bin 
    2.82s real     0.09s user     0.11s system
    0.156 cputime

intel:/share> tracelogger -M -S200M -n0 -w &

intel:/dev/shmem> time cp 300mbyte.bin 300mbyte_cp.bin 
    3.90s real     0.07s user     0.24s system
    0.350 cputime

system environment:
QNX6.5, x86 / armv7

0.156 -> 0.350 msec (+124%)

cputime was calculated with ClockId(pid, tid) and ClockTime(clockId,...) calls.

For other processes in the system the cputime keeps almost the same (with/without tracelogger).

Has anyone an idea why tracelogger causes this increase of cpu time?

First I supposed an influence by the number of traceevents of the process. But a test binary creating a huge number of 
trace events does not show the cpu time behavior.

Could it be a memory or caching issue?




_______________________________________________

OSTech
http://community.qnx.com/sf/go/post110954
To cancel your subscription to this discussion, please e-mail ostech-core_os-unsubscribe@community.qnx.com

Colin Burgess(deleted)

Re: tracelogger: extraordinary overhead in one thread

Colin Burgess(deleted)

07/14/2014 9:51 AM

post111050

Re: tracelogger: extraordinary overhead in one thread

Hi Thomas,

This is an interesting case - cp is basically a frantic burst of 16k 
messages.  I imagine that it is proportionally
more impacted by tracing than others because we are peeking into the 
message, and because of the nature of
how /dev/shmem is implemented.

It would be interesting to see how it is affected by copying to 
/dev/null instead - the same impact?

BTW I would look at real time rather than cpu time - remember that 
cputime is tick based, which can be a little
misleading at times.

On 2014-07-07 11:42 AM, Thomas Schickentanz wrote:
> running tracelogger while copying data in ram shows big impact on cputime of copy process:
>
> intel:/dev/shmem> time cp 300mbyte.bin 300mbyte_cp.bin
>      2.82s real     0.09s user     0.11s system
>      0.156 cputime
>
> intel:/share> tracelogger -M -S200M -n0 -w &
>
> intel:/dev/shmem> time cp 300mbyte.bin 300mbyte_cp.bin
>      3.90s real     0.07s user     0.24s system
>      0.350 cputime
>
> system environment:
> QNX6.5, x86 / armv7
>
> 0.156 -> 0.350 msec (+124%)
>
> cputime was calculated with ClockId(pid, tid) and ClockTime(clockId,...) calls.
>
> For other processes in the system the cputime keeps almost the same (with/without tracelogger).
>
> Has anyone an idea why tracelogger causes this increase of cpu time?
>
> First I supposed an influence by the number of traceevents of the process. But a test binary creating a huge number of
 trace events does not show the cpu time behavior.
>
> Could it be a memory or caching issue?
>
>
>
>
> _______________________________________________
>
> OSTech
> http://community.qnx.com/sf/go/post110954
> To cancel your subscription to this discussion, please e-mail ostech-core_os-unsubscribe@community.qnx.com

Thomas Schickentanz

07/18/2014 9:42 AM

post111105

Re: tracelogger: extraordinary overhead in one thread

Hi Colin,

I agree that accounting cputime with ClockTime() is not best. cp thread has average running duration of about 9 µsec
and will not be accounted well with 1 msec resolution.
For measurement of usecase "copy to /dev/null without kerneltrace" I tried tr_lite with GPTimer of hardware for
accounting. Because tr_lite registers dynamic filtering callback at kernel site for thread state change events, it has
also an impact on the usecase.

without tracing:
intel:/dev/shmem> time cp 300mbyte.bin /dev/null
1.34s real 0.06s user 0.08s system

with tr_lite:
intel:/dev/shmem> tr-lite &

intel:/dev/shmem> time cp ram300mbyte.bin /dev/null
1.77s real 0.05s user 0.14s system
tr_lite: 230 msec cputime for cp

with tracelogger:
intel:/tmp> tracelogger -M -S300M -n0 -w &

intel:/dev/shmem> time cp ram300mbyte.bin /dev/null
2.38s real 0.09s user 0.17s system
kerneltrace: 325 msec cputime for cp

As a result I can see that the real time decreases (compared to shmem -> shmem copy) by 1.5 seconds (because write to /
dev/null is faster, see below), but the difference between the two usecases keeps the same (1.34 -> 2.38 and 2.82 -> 3.
9).
Cputimes by tr_lite and kerneltrace should be close to reality. So we can see a differnce of 41% compared with tr_lite
which has itself an impact on cputime.

Some interesting numbers from kerneltrace (client + server time):

client 'cp' -> server bytes duration cputime
'procnto-smp-instr' 629145600 2292.33 1743.47
eREAD 314572800 1663.94 1371.10
ram300mbyte.bin 314572800 1663.94 1371.10
eWRITE 314572800 628.38 372.38
/dev/null 314572800 628.38 372.38

I think the kernelevents for scheduling cp and procnto have a proportional impact on the cputime.

number of state switches to 'RUNNING' in cp usecase intervall (2389 msec):
procnto: 42809 (without idle threads)
cp: 38489
other : 6414
total: 87712

Kerneltrace and tr_lite both log these state switch events. The difference in times could be explained by the fact that
kerneltrace has to log additional events and must write the events to ram.

My summary:
Using kerneltracing, everyone should be aware that cpu overhead is added to the threads proportionally by kernelevents
of the threads.

By the way: tracelogger (with IDE) is the most exciting analysis tool I ever worked with!

Colin Burgess(deleted)

07/18/2014 12:40 PM

post111109

Re: tracelogger: extraordinary overhead in one thread

Thanks for your (as always) thoughtful and interesting analysis, Thomas!

Might I suggest one last test?  Let's see how this affect plays out 
running in uniprocessor mode.

Best regards,

Colin

On 2014-07-18 9:42 AM, Thomas Schickentanz wrote:
> Hi Colin,
>
> I agree that accounting cputime with ClockTime() is not best. cp thread has average running duration of about 9 µsec 
and will not be accounted well with 1 msec resolution.
> For measurement of usecase "copy to /dev/null without kerneltrace" I tried tr_lite with GPTimer of hardware for 
accounting. Because tr_lite registers dynamic filtering callback at kernel site for thread state change events, it has 
also an impact on the usecase.
>
> without tracing:
> intel:/dev/shmem> time cp 300mbyte.bin /dev/null
>      1.34s real     0.06s user     0.08s system
>
> with tr_lite:
> intel:/dev/shmem> tr-lite &
>
> intel:/dev/shmem> time cp ram300mbyte.bin /dev/null
>      1.77s real     0.05s user     0.14s system
>      tr_lite: 230 msec cputime for cp
>
> with tracelogger:
> intel:/tmp> tracelogger -M -S300M -n0 -w &
>
> intel:/dev/shmem> time cp ram300mbyte.bin /dev/null
>      2.38s real     0.09s user     0.17s system
>      kerneltrace: 325 msec cputime for cp
>
> As a result I can see that the real time decreases (compared to shmem -> shmem copy) by 1.5 seconds (because write to 
/dev/null is faster, see below), but the difference between the two usecases keeps the same (1.34 -> 2.38 and 2.82 -> 3.
9).
> Cputimes by tr_lite and kerneltrace should be close to reality. So we can see a differnce of 41% compared with tr_lite
 which has itself an impact on cputime.
>
> Some interesting numbers from kerneltrace (client + server time):
> 			
> client 'cp' -> server	bytes    	        duration	        cputime
> 'procnto-smp-instr'	629145600 	2292.33		1743.47
>     eREAD	                314572800 	1663.94		1371.10
>        ram300mbyte.bin	314572800	        1663.94		1371.10
>     eWRITE	                314572800 	 628.38	 	 372.38
>        /dev/null	        314572800 	 628.38	 	 372.38
>
> I think the kernelevents for scheduling cp and procnto have a proportional impact on the cputime.
>
> number of state switches to 'RUNNING' in cp usecase intervall (2389 msec):
> procnto:  42809 (without idle threads)
> cp:         38489
> other   :    6414
> total:       87712
>
> Kerneltrace and tr_lite both log these state switch events. The difference in times could be explained by the fact 
that kerneltrace has to log additional events and must write the events to ram.
>
> My summary:
> Using kerneltracing, everyone should be aware that cpu overhead is added to the threads proportionally by kernelevents
 of the threads.
>
> By the way: tracelogger (with IDE) is the most exciting analysis tool I ever worked with!
>
>
>
>
> _______________________________________________
>
> OSTech
> http://community.qnx.com/sf/go/post111105
> To cancel your subscription to this discussion, please e-mail ostech-core_os-unsubscribe@community.qnx.com

Thomas Schickentanz

07/22/2014 10:04 AM

post111154

Re: tracelogger: extraordinary overhead in one thread

I set the affinity for all processes/threads to the first core (except idle thread for core two ;-)) with "slay -i -C0 .
..". 
After starting tracelogger and cp with "on -C0 ..." the result was the same for real time and cpu time.

Return

The text you entered is not a valid object ID
More Information
Object IDs begin with an object prefix and end with a number. For example, if you enter
artf2345
the application will jump directly to an artifact with the ID artf2345. Some valid object prefixes are:
artf	for an artifact
doc	for a document
page	for a project page
topc	for a discussion topic
wiki	for a wiki page