Colin Burgess(deleted)
|
Re: tracelogger: extraordinary overhead in one thread
|
Colin Burgess(deleted)
07/18/2014 12:40 PM
post111109
|
Re: tracelogger: extraordinary overhead in one thread
Thanks for your (as always) thoughtful and interesting analysis, Thomas!
Might I suggest one last test? Let's see how this affect plays out
running in uniprocessor mode.
Best regards,
Colin
On 2014-07-18 9:42 AM, Thomas Schickentanz wrote:
> Hi Colin,
>
> I agree that accounting cputime with ClockTime() is not best. cp thread has average running duration of about 9 µsec
and will not be accounted well with 1 msec resolution.
> For measurement of usecase "copy to /dev/null without kerneltrace" I tried tr_lite with GPTimer of hardware for
accounting. Because tr_lite registers dynamic filtering callback at kernel site for thread state change events, it has
also an impact on the usecase.
>
> without tracing:
> intel:/dev/shmem> time cp 300mbyte.bin /dev/null
> 1.34s real 0.06s user 0.08s system
>
> with tr_lite:
> intel:/dev/shmem> tr-lite &
>
> intel:/dev/shmem> time cp ram300mbyte.bin /dev/null
> 1.77s real 0.05s user 0.14s system
> tr_lite: 230 msec cputime for cp
>
> with tracelogger:
> intel:/tmp> tracelogger -M -S300M -n0 -w &
>
> intel:/dev/shmem> time cp ram300mbyte.bin /dev/null
> 2.38s real 0.09s user 0.17s system
> kerneltrace: 325 msec cputime for cp
>
> As a result I can see that the real time decreases (compared to shmem -> shmem copy) by 1.5 seconds (because write to
/dev/null is faster, see below), but the difference between the two usecases keeps the same (1.34 -> 2.38 and 2.82 -> 3.
9).
> Cputimes by tr_lite and kerneltrace should be close to reality. So we can see a differnce of 41% compared with tr_lite
which has itself an impact on cputime.
>
> Some interesting numbers from kerneltrace (client + server time):
>
> client 'cp' -> server bytes duration cputime
> 'procnto-smp-instr' 629145600 2292.33 1743.47
> eREAD 314572800 1663.94 1371.10
> ram300mbyte.bin 314572800 1663.94 1371.10
> eWRITE 314572800 628.38 372.38
> /dev/null 314572800 628.38 372.38
>
> I think the kernelevents for scheduling cp and procnto have a proportional impact on the cputime.
>
> number of state switches to 'RUNNING' in cp usecase intervall (2389 msec):
> procnto: 42809 (without idle threads)
> cp: 38489
> other : 6414
> total: 87712
>
> Kerneltrace and tr_lite both log these state switch events. The difference in times could be explained by the fact
that kerneltrace has to log additional events and must write the events to ram.
>
> My summary:
> Using kerneltracing, everyone should be aware that cpu overhead is added to the threads proportionally by kernelevents
of the threads.
>
> By the way: tracelogger (with IDE) is the most exciting analysis tool I ever worked with!
>
>
>
>
> _______________________________________________
>
> OSTech
> http://community.qnx.com/sf/go/post111105
> To cancel your subscription to this discussion, please e-mail ostech-core_os-unsubscribe@community.qnx.com
|
|
|