Systemtap OMAP Profiler

Goal

 * Easy to use systemtap tapset for benchmarking and timestamping kernel functions on OMAP devices
 * Minimise systemtap overhead
 * Systemtap uses 64-bit types for variables and so design a tapset to minimise use of 64-bit types.


 * Ability to use 32kHz timer or a GP timer for benchmarking
 * GP timer can run at a derivative of the OMAP SYS_CLK and so are very accurate for benchmarking.
 * For example, the OMAP4 blaze has a SYS_CLK of 38.4MHz.

Benchmarking Functions
1. Download and build systemtap as described here.

2. Copy the file Omap_timestamp.stp into your systemtap tapset directory. cp omap_timestamp.stp /systemtap-1.x-target/share/systemtap/tapset/ 3. Download, uncompress and copy the file to your systemtap tapset directory. cp omap_profile.stp /systemtap-1.x-target/share/systemtap/tapset/ 4. Create a new systemtap script file. vim /systemtap-1.x-target/share/systemtap/myscript.stp 5. In your script define if you want to use the 32kHz clock or a GP timer.
 * For a 32kHz clock add the following.

%{ %}
 * 1) define TIMER_USE_32K 1
 * For a GP timer add the following.

%{ %} 6. Define the maximum number of benchmarks you wish to collect. %{ %} 7. Optionally, for each benchmark you can call profile_benchmark_init to specify a name for the benchmark. This name will be used for displaying the results. Typically this would be called from the begin probe. probe begin { profile_benchmark_init("benchmark-name”, 0) } 8. For each function you wish to benchmark add a kprobe to call profile_start on entry. Where “func_name” is the kernel function name and the value passed is an index that is used to identify each benchmark. The index MUST be less than PROFILE_BENCHMARK_MAX_NUM. probe kernel.function("func_name”).call { profile_start(0) } 9. For each function you wish to benchmark add a kprobe to call profile_end on exit. Note that the value passed should be the same as passed to the profile_start. probe kernel.function("func_name").return { profile_end(0) } 10. To display the results simply call profile_show_benchmarks function. This would typically done in the end probe. probe end { profile_show_benchmarks }
 * 1) define TIMER_USE_GPTIMER 1
 * 1) define PROFILE_BENCHMARK_MAX_NUM 10

Benchmarking Example
Here is a example that will use a GP timer to benchmark the functions “omap_dsi_prepare_update” and “omap_dsi_update" for 30 seconds and displays the results.

%{ %}
 * 1) define TIMER_USE_GPTIMER 1
 * 2) define	PROFILE_BENCHMARK_MAX_NUM 2

probe timer.s(30) { exit }

probe begin { profile_benchmark_init("omap_dsi_prepare_update", 0) profile_benchmark_init("omap_dsi_update", 1) }

probe kernel.function("omap_dsi_prepare_update").call { profile_start(0) }

probe kernel.function("omap_dsi_prepare_update").return { profile_end(0); }

probe kernel.function("omap_dsi_update").call { profile_start(1) }

probe kernel.function("omap_dsi_update").return { profile_end(1); }

probe end { profile_show_benchmarks }

Here is an example of the results output from the previous example.

OMAP Profiler:      Total profile time 30080973104 nsecs (985712 ticks @ 32kHz) Profiling clock frequency 19200000

Function                      Count                Min                  Max                  Total                Avg omap_dsi_prepare_update (ns)  801                  25064                941720               52629356             65676 omap_dsi_prepare_update (ticks) 801                 482                  18110                1012103              1263 omap_dsi_update (ns)          801                  31460                1172808              68042104             84916 omap_dsi_update (ticks)       801                  605                  22554                1308502              1633


 * Where
 * Results are expressed in timer ticks as well as time (ns) for each function
 * Total profiling time shows how long the benchmark ran
 * Profiling clock frequency is shown to be 19.2MHz
 * Count shows number of times the function was called
 * Min is the minimum time for function call
 * Max is the maximum time for function call
 * Total is the total time spent in the function
 * Avg is the average time spent in the function

Timestamping Functions
1. Download and build systemtap as described here.

2. Copy the file Omap_timestamp.stp into your systemtap tapset directory. cp omap_timestamp.stp /systemtap-1.x-target/share/systemtap/tapset/ 3. Download, uncompress and copy the file to your systemtap tapset directory. cp omap_profile.stp /systemtap-1.x-target/share/systemtap/tapset/ 4. Create a new systemtap script file. vim /systemtap-1.x-target/share/systemtap/myscript.stp 5. In your script define if you want to use the 32kHz clock or a GP timer.
 * For a 32kHz clock add the following.

%{ %}
 * 1) define TIMER_USE_32K 1
 * For a GP timer add the following.

%{ %} 6. Define the maximum number of functions you wish to timestamp. %{ %} 7. Define the maximum number of timestamps you wish to collect for each function. %{ %} 8. Optionally, for each benchmark you can call profile_timestamp_init to specify a name for the benchmark. This name will be used for displaying the results. Typically this would be called from the begin probe. probe begin { profile_timestamp_init("timestamp-name”, 0) } 9. For each function you wish to timestamp add a kprobe to call profile_timestamp on entry. Where “func_name” is the kernel function name and the value passed is an index that is used to identify each benchmark. The index MUST be less than PROFILE_TIMESTAMP_MAX_NUM. probe kernel.function("func_name”).call { profile_timestamp(0) } 10. To display the results simply call profile_show_benchmarks function. This would typically done in the end probe. probe end { profile_show_benchmarks }
 * 1) define TIMER_USE_GPTIMER 1
 * 1) define PROFILE_TIMESTAMP_MAX_NUM 10
 * 1) define PROFILE_TIMESTAMP_MAX_SAMPLES 10

Timestamping Example
%{ %}
 * Here is a example that will use a GP timer to timestamp the “musb_interrupt” function for 30 seconds and displays the results.
 * 1) define TIMER_USE_GPTIMER 1
 * 2) define PROFILE_TIMESTAMP_MAX_NUM 1
 * 3) define PROFILE_TIMESTAMP_MAX_SAMPLES 10

probe timer.s(30) { exit }

probe begin { profile_timestamp_init("musb_interrupt", 0) }

probe kernel.function("musb_interrupt").call { profile_timestamp(0) }

probe end { profile_show_timestamps }

Here is an example of the results output from the previous example. OMAP Profiler:      Total profile time 30090646993 nsecs (986029 ticks @ 32kHz) Profiling clock frequency 19200000 Hz

Function                      Index                Timestamp (ticks)    Delta (ticks)        Timestamp (ns)       Delta (ns) musb_interrupt                0                    164721061            0                    8565495172           0 musb_interrupt                1                    164731562            10501                8566041224           546052 musb_interrupt                2                    164794507            62945                8569314364           3273140 musb_interrupt                3                    165149452            354945               8587771504           18457140 musb_interrupt                4                    165159885            10433                8588314020           542516 musb_interrupt                5                    165170432            10547                8588862464           548444 musb_interrupt                6                    165197199            26767                8590254348           1391884 musb_interrupt                7                    165207346            10147                8590781992           527644 …


 * Where …
 * Results are expressed in timer ticks as well as time (ns) for each timestamp and each line represents a single call to that function.
 * Total profiling time show how long the benchmark ran
 * Profiling clock frequency is shown to be 19.2MHz
 * Index is a number representing each time the function was called
 * Timestamp (ticks) is the timestamp expressed in timer ticks
 * Delta (ticks) is the difference between the current timestamp and the last expressed in ticks.
 * Timestamp (ns) is the timestamp expressed in nsecs
 * Delta (ticks) is the difference between the current timestamp and the last expressed in nsecs.

Known Issues
According to the systemtap language reference “all SystemTap functions and probes run with interrupts disabled, thus you cannot call functions that might sleep within the embedded C.” This makes sense of probing functions, however, for begin and end probe points it would be nice to have the ability to call functions that might sleep. Due to this limitation, when using a GP timer, when you will see the following BUG report when the GP timer is initialised at the very start of the benchmarking. There should be no impact from this but it is annoying. A bug has been filed against systemtap here. [ 242.426544] omap4_dsi_update: systemtap: 1.7/0.143, base: bf013000, memory: 19data/29text/3ctx/10net/17alloc kb, probes: 14 [ 242.438507] BUG: sleeping function called from invalid context at kernel/mutex.c:85 [ 242.446899] in_atomic: 1, irqs_disabled: 0, pid: 735, name: stapio [ 242.454193] Backtrace: [ 242.456970] [ ] (dump_backtrace+0x0/0x110) from [ ] (dump_stack+0x18/0x1c) [ 242.466033]  r6:00000348 r5:ffffe337 r4:ffff9676 r3:60000013 [ 242.472503] [ ] (dump_stack+0x0/0x1c) from [ ] (__might_sleep+0x130/0x134) [ 242.481567] [ ] (__might_sleep+0x0/0x134) from [ ] (mutex_lock+0x24/0x44) [ 242.490539]  r5:bf01c720 r4:c0877640 [ 242.494537] [ ] (mutex_lock+0x0/0x44) from [ ] (omap_dm_timer_request+0x1c/0xac) [ 242.504180]  r4:c0877640 r3:ffffffff [ 242.508178] [ ] (omap_dm_timer_request+0x0/0xac) from [ ] (timer_init_gptimer+0x10/0x8c [omap4_dsi_update]) …