Installing and Using Ftrace

The following Wiki explains how to use Ftrace and its different tracers on Linux releases. It describes which tracers work with kernel-omap3 mainline or linux-omap-2.6 mainline, and how they work.

Note: Every functionality was tested on v2.6.33 and v2.6.32 kernel.

'''Note: Some options are enable on Android (Function Tracer, Irq Tracer...) but the patches of this article won't work.

'''Note: Added section 13. Ftrace on p-android-omap-3.0 kernel tree base with Android File system [ICS].

Kernel configuration & Re-build
The kernel should be reconfigured and rebuilt to enable the support of FTrace

$ make CROSS_COMPILE=arm-none-linux-gnueabi- ARCH=arm distclean $ make CROSS_COMPILE=arm-none-linux-gnueabi- ARCH=arm omap_zoom2_defconfig $ make CROSS_COMPILE=arm-none-linux-gnueabi- ARCH=arm xconfig
 * Clone the choosen kernel and point to the appropriate commit id.
 * Run the following command from the kernel folder

Kernel Hacking -> Tracers -> FUNCTION_TRACER Kernel Hacking -> Tracers -> FUNCTION_GRAPH_TRACER (if possible) Kernel Hacking -> Tracers -> STACK_TRACER Kernel Hacking -> Tracers -> DYNAMIC_FTRACE (see enabling Dynamic Ftrace)
 * Set the following in xconfig

This kernel configuration focus on using Ftrace as a debugging tool. Some configurations for Ftrace are used for other purposes, like finding latency or analyzing the system, they will be explain later.

$ make CROSS_COMPILE=arm-none-linux-gnueabi- ARCH=arm uImage (For further information about the compilation of the kernel see Linux OMAP Kernel Project)
 * Compile the kernel


 * Boot the system with this new kernel

Using Ftrace
Ftrace has its control ﬁles in the debugfs system.This is usually mounted in /sys/kernel/debug. If it is not already mounted, then you can mount it yourself with: Many Developpers prefer to have this directory at the root of the system That creates a /debug/tracing subdirectory which is used to control ftrace and for getting output from the tool.
 * 1) mount -t debugfs nodev /sys/kernel/debug
 * 1) mkdir /mnt/debug
 * 2) mount -t debugfs nodev /mnt/debugfs

Note: all the command of this article will be pass in the /debug/tracing/ directory

Choosing a tracer
To find out which tracers are available, simply cat the available_tracers file in the tracing directory: function sched_switch nop To enable, for example, the function tracer, just echo "function" into the current_tracer file. function Note: Ftrace can handle only one tracer at the same time
 * 1) cat available_tracers
 * 1) echo function > current_tracer
 * 2) cat current_tracer

Using a tracer
At this point Ftrace is ready to trace the function, to start it just echo 1 in the tracing_on file To stop the trace, just echo 0 in the same file: The trace is contained in the trace file, here is an example of the output from a function trace. The header helps to decode the various fields in the trace. #    avahi-daemon-630   [000]  9507.400848: vfs_read <-sys_read avahi-daemon-630  [000]  9507.400848: rw_verify_area <-vfs_read avahi-daemon-630  [000]  9507.400848: do_sync_read <-vfs_read avahi-daemon-630  [000]  9507.400848: sock_aio_read <-do_sync_read avahi-daemon-630  [000]  9507.400848: alloc_sock_iocb <-sock_aio_read avahi-daemon-630  [000]  9507.400848: unix_stream_recvmsg <-sock_aio_read
 * 1) echo 1 > tracing_on
 * 1) echo 0 > tracing_on
 * 1) cat trace | head -10
 * 2) tracer: function
 * 1)           TASK-PID    CPU#    TIMESTAMP  FUNCTION

Tracing a specific process
Perhaps you only need to trace a specific process, or set of processes. The file set_ftrace_pid lets you specify specific processes that you want to trace. To just trace the current thread you can do the following:

[tracing]# echo $$ > set_ftrace_pid

The above will set the function tracer to only trace the bash shell that executed the echo command. If you want to trace a specific process, you can create a shell script wrapper program.

[tracing]# cat ~/bin/ftrace-me #!/bin/sh DEBUGFS=`grep debugfs /proc/mounts | awk '{ print $2; }'` echo $$ > $DEBUGFS/tracing/set_ftrace_pid echo function > $DEBUGFS/tracing/current_tracer exec $* [tracing]# ~/bin/ftrace-me ls -ltr

Note, you must clear the set_ftrace_pid file if you want to go back to generic function tracing after performing the above.

[tracing]# echo -1 > set_ftrace_pid

-

The Tracers in detail
All the following tracers are plugins of Ftrace which bring different presentation of trace, and trace different things (Latency, scheduler events)

nop tracer
This is the tracer by default which trace no function. But it can still be usefull to trace events (see )

Function tracer
This tracer is activated by enabling CONFIG_FUNCTION_TRACER in the kernel configuration. To work, it needs the kernel variable ftrace_enabled to be turned on, otherwise this tracer is a nop.

# sysctl kernel.ftrace_enabled=1 # echo function > current_tracer # cat current_tracer function

[tracing]# cat trace | head -10 # tracer: function #    #           TASK-PID    CPU#    TIMESTAMP  FUNCTION #             | |       |          |         |                 bash-16939 [000]  6075.461561: mutex_unlock <-tracing_set_tracer -0    [001]  6075.461561: _spin_unlock_irqrestore <-hrtimer_get_next_event -0    [001]  6075.461562: rcu_needs_cpu <-tick_nohz_stop_sched_tick bash-16939 [000] 6075.461563: inotify_inode_queue_event <-vfs_write -0    [001]  6075.461563: mwait_idle <-cpu_idle bash-16939 [000] 6075.461563: __fsnotify_parent <-vfs_write

The header explains the format of the output pretty well. The first two items are the traced task name and PID. The CPU that the trace was executed on is within the brackets. The timestamp is the time since boot in. format, followed by the function name with its parent following the "<-" symbol.

This information is quite powerful and shows the flow of functions nicely. But it can be a bit hard to follow.

How the Function Tracer work ?
The function tracer (enabled by CONFIG_FUNCTION_TRACER) is a way to trace almost all functions in the kernel. When function tracing is enabled, the kernel is compiled with the gcc option -pg. This is a proﬁler that will make all functions call a special function named mcount. Unfortunately, this could cause a very large overhead. To improve it, enable the dynamic Ftrace, then the mcount calls, when not in use, are converted at run time to nops. This allows the function tracer to have zero overhead when not in use. (Finding Origins of Latencies Using Ftrace by Steven Rostedt)

Sched_switch Tracer
This tracer, also activated by enabling CONFIG_FUNCTION_TRACER, traces the context switches and wakeups between tasks. #            bash-2948  [000]   749.962524:   2948:120:S   + [000]  2948:120:S bash bash-2948 [000]   749.962616:   2948:120:S ==> [000]  8369:120:R <...> <...>-8369 [000]   749.964264:   8369:120:R   + [000]   625:120:S dbus-daemon <...>-8369 [000]   749.964294:   8369:120:R ==> [000]   625:120:R dbus-daemon dbus-daemon-625  [000]   749.965454:    625:120:S ==> [000]  8369:120:R <...> <...>-8369 [000]   749.965576:   8369:120:R   + [000]   625:120:S dbus-daemon
 * 1) cat trace
 * 2) tracer: sched_switch
 * 1)           TASK-PID    CPU#    TIMESTAMP  FUNCTION

Wake ups are represented by a "+" and the context switches are shown as "==>". The format is:


 * Context switches:

Previous task             Next Task : : ==>  : :


 * Wake ups:

Current task              Task waking up  : :    +  : :

The prio is the internal kernel priority, which is the inverse of the priority that is usually displayed by user-space tools. Zero represents the highest priority (99). Prio 100 starts the "nice" priorities with 100 being equal to nice -20 and 139 being nice 19. The prio "140" is reserved for the idle task which is the lowest priority thread (pid 0).Below is a quick chart to map the kernel priority to user land priorities.

Kernel Space                    User Space ===============================================================  0(high) to  98(low)     user RT priority 99(high) to 1(low) with SCHED_RR or SCHED_FIFO --- 99                       sched_priority is not used in scheduling decisions(it must be specified as 0) --- 100(high) to 139(low)    user nice -20(high) to 19(low) --- 140                      idle task priority ---

The task states are:


 * R - running : wants to run, may not actually be running
 * S - sleep  : process is waiting to be woken up (handles signals)
 * D - disk sleep (uninterruptible sleep) : process must be woken up (ignores signals)


 * T - stopped : process suspended
 * t - traced : process is being traced (with something like gdb)
 * Z - zombie : process waiting to be cleaned up
 * X - unknown

Sched_switch viewer
To visualize the temporal relationship of Linux tasks, a graphical tools makes it possible. This tool convert the trace data to VCD (value change dump) data. After this, it is possible to view the context switches in a vcd viewer, such as GTKWave. This tools is available [[Media:sched_switch-0.1.1.tar.gz]].

To use this program first make a ftrace file, and export it into a text file cat /debug/tracing/trace > /tmp/trace.txt

Now convert the trace data and start the viewer:

sched_switch /tmp/trace.txt /tmp/trace.vcd gtkwave /tmp/trace.vcd

To display a particular data channel within gtkwave, select sched_switch in the top left window and drag and drop the label of the requested channel from the bottom left window labeled "Signals" to the drop area at the left side of the main wave display area (also labeled "Signals"). Alternatively, you may select the requested channels in the bottom left window labeled "Signals" and then click on the "Insert" button below the window.

The VCD uses the following definitions :
 * Z    no cpu assigned
 * U    undefined (wake up is done for a cpu to become ready)
 * 0/1  binary encoded cpu number
 * L/H  binary encoded cpu number with priority inheritance

Function Graph Tracer
This tracer depends of the CONFIG_FUNCTION_GRAPH kernel option.

This tracer is similar to the function tracer except that it probes a function on its entry and its exit, and also the duration of each function.

Enabling Function Graph Tracer for OMAP3
Actually, Function Graph tracer for the ARM architecture is not supported in the mainline. Tim Bird works on patches to enable this option (source), but some improvements has to be done for OMAP3. The following patches will enable this tracer on OMAP3 [[Media:Function_graph_tracer_support.tar.gz]].

The Function graph presentation
# # |    |   |                     |   |   |   |  0)   0.000 us    |                      } /* clear_buddies */  0)   0.000 us    |                      update_min_vruntime; 0) + 30.517 us  |                    } /* dequeue_entity */  0)               |                    dequeue_entity { 0)              |                      update_curr {  0)   0.000 us    |                        update_min_vruntime; 0)  0.000 us    |                      }  0)   0.000 us    |                      clear_buddies; 0)  0.000 us    |                      update_min_vruntime;  0) + 30.518 us   |                    }
 * 1) tracer: function_graph
 * 1) CPU  DURATION                  FUNCTION CALLS

The function graph tracer traces both the entry and exit of a function, which gives the tracer the ability to know the depth of functions that are called. The function graph tracer can make following the flow of execution within the kernel much easier to follow with the human eye.

This gives the start and end of a function denoted with the C like annotation of "{" to start a function and "}" at the end. Leaf functions, which do not call other functions, simply end with a ";". The DURATION column shows the time spent in the corresponding function. The function graph tracer records the time the function was entered and exited and reports the difference as the duration.

The "+" that are there are annotation marker. When the duration is greater than 10 microseconds, a "+" is shown. If the duration is greater than 100 microseconds a "!" will be displayed.

Function Profiling
With the function profiler, it is possible to take a good look at the actual function execution and not just samples. If CONFIG_FUNCTION_GRAPH_TRACER is configured in the kernel, the function profiler will use the function graph infrastructure to record how long a function has been executing. If just CONFIG_FUNCTION_TRACER is configured, the function profiler will just count the functions being called.

Function                              Hit    Time            Avg ---               ---  schedule                               923    325906219 us     353094.4 us   vfs_read                               238    109621612 us     460595.0 us   sys_read                               208    108491821 us     521595.2 us   do_sync_read                           123    49456207 us     402082.9 us   pipe_read                               63    48292785 us     766552.1 us   pipe_wait                               63    48276550 us     766294.4 us   sys_futex                                5    48068145 us     9613629 us   do_futex                                 5    48068115 us     9613623 us
 * 1) echo nop > current_tracer
 * 2) echo 1 > function_profile_enabled
 * 3) cat trace_stat/function0 |head

The above also includes the times that a function has been preempted or schedule was called and the task was swapped out. This may seem useless, but it does give an idea of what functions get preempted often. Ftrace also includes options that allow you to have the function graph tracer ignore the time the task was scheduled out.

Function                              Hit    Time            Avg ---               ---  cpuidle_idle_call                      337    20587921 us     61091.75 us   omap3_enter_idle                       337    20555786 us     60996.39 us   omap3_enter_idle_bm                    329    20523254 us     62380.71 us   omap_sram_idle                         324    19609313 us     60522.57 us   asm_do_IRQ                            2487    1954010 us     785.689 us    do_page_fault                         5639    1825439 us     323.716 us    handle_mm_fault                       5656    1386108 us     245.068 us    do_DataAbort                          3548    1214050 us     342.178 us
 * 1) echo 0 > options/sleep-time
 * 2) echo 0 > function_profile_enabled
 * 3) echo 1 > function_profile_enabled
 * 4) cat trace_stat/function0  | head

Another option that affects profiling is graph-time (again with a "-"). By default it is enabled. When enabled, the times for a function include the times of all the functions that were called within the function. As you can see from the output in the above examples, several system calls are listed with the highest average. When disabled, the times only include the execution of the function itself, and do not contain the times of functions that are called from the function:

Function                              Hit    Time            Avg ---               --- omap_sram_idle                          41    1189849 us     29020.72 us  sub_preempt_count                    15131    56701.65 us     3.747 us     add_preempt_count                    14506    52795.40 us     3.639 us     _pwrdm_state_switch                   1513    31951.91 us     21.118 us    omap_i2c_set_wfc_mpu_wkup_lat          150    31738.27 us     211.588 us   prm_read_mod_bits_shift               2579    25024.42 us     9.703 us     __do_fault                             472    23559.57 us     49.914 us    handle_level_irq                       352    18859.87 us     53.579 us
 * 1) echo 0 > options/graph-time
 * 2) echo 0 > function_profile_enabled
 * 3) echo 1 > function_profile_enabled
 * 4) cat trace_stat/function0  | head


 * NOTE: Disabling the function profiler and then re-enabling it causes the numbers to reset.

FTD (Function Trace Dump)
To analyse system data, a post-processing tool called ftd was written by Tim Bird (source).

To install: * Download * rename to 'ftd': mv Ftd.txt ftd * make it executable: chmod a+x * Put it on your path somewhere: sudo mv ftd /usr/local/bin

FTD is a script with capability to show call counts and cumulative time for functions in a trace log.

Note: Ftd is written in Python, so it is recommended to move your trace log to a development host and run ftd there, rather than on the target.

To retrieve the trace log data, use:
 * 1) cat trace > /tmp/trace-data.txt

To see a list of functions, sorted by total time spent in them, use: $ ftd /tmp/trace-data.txt Function                           Count Time       Average  Local --- - -- -- do_page_fault                         159   76416012   480603    4364019 handle_mm_fault                      159   56945800   358149    4516606 do_DataAbort                          94   50079342   532758    1373293 asm_do_IRQ                            36   43182375  1199510    1098636 __do_fault                           140   40649404   290352    7293704 flush_old_exec                         1   33416739 33416739     152581 mmput                                  1   32623290 32623290      30522 exit_mmap                              1   32318110 32318110     427241

Other useful tasks that ftd can be used for include:


 * Sorting the function list by function count—the number of times that the function was called during the trace.
 * Examining the local time of a function. The local time of a function is the elapsed time between the start and end of the function, minus the time spent in all functions called between the start and end of the function. Note that this includes not just children function called by this function, but also interrupts. Local time also includes the time spent in user space, and in other processes’ kernel functions (ie, when the function’s process is scheduled out.) So local time should be interpreted cautiously, with this understanding.
 * Finding the subroutines called by functions the most times.

See ftd -h for usage help, for the command line options to use for these tasks.

Boot Tracer
This tracer is activated with the CONFIG_BOOT_TRACER kernel configuration option.

This tracer helps developers to optimize boot time: it records the timings of initcalls and traces key events and the identity of tasks that can cause boot delays.

The easiest way is to pass initcall_debug and printk.time=1 to the kernel command line, and boot the system. Then copy/paste the console output to a file (say boot.log or type dmesg > boot.log.

Then, generate a boot graph by using a utility provided in the Linux kernel sources:

cat boot.log | perl $(KERNEL_SRC_DIR)/scripts/bootgraph.pl > boot.svg



This graph can be viewed with Inkscape or Firefox. It gives a visual representation of the delays in initcalls.

Another way to get the same results is to pass in initcall_debug and ftrace=initcall to the kernel command line. You can then access timing information in the /sys/kernel/debug/tracing/trace file:

initcall # [   0.000000000] calling  init_mmap_min_addr+0x0/0x10 @ 1 [   0.000000000] initcall init_mmap_min_addr+0x0/0x10 returned 0 after 0 msecs [   0.000000000] calling  net_ns_init+0x0/0x1ec @ 1 [   0.000000000] initcall net_ns_init+0x0/0x1ec returned 0 after 0 msecs [   0.000000000] calling  ptrace_break_init+0x0/0x24 @ 1 [   0.000000000] initcall ptrace_break_init+0x0/0x24 returned 0 after 0 msecs [   0.000000000] calling  consistent_init+0x0/0x108 @ 1 [   0.000000000] initcall consistent_init+0x0/0x108 returned 0 after 0 msecs [   0.000000000] calling  v6_userpage_init+0x0/0x8 @ 1 [   0.000000000] initcall v6_userpage_init+0x0/0x8 returned 0 after 0 msecs [   0.000000000] calling  sysctl_init+0x0/0x1c @ 1 [   0.000000000] initcall sysctl_init+0x0/0x1c returned 0 after 0 msecs [   0.000000000] calling  ksysfs_init+0x0/0xc4 @ 1 swapper-1    [000]     0.000000:      1:120:R   + [000]     6:120:S khelper swapper-1    [000]     0.000000:      1:120:D ==> [000]     6:120:R khelper khelper-6    [000]     0.000000:      6:120:R   + [000]     9:120:R <...>
 * 1) cat current_tracer
 * 1) cat trace |head -20
 * 2) tracer: initcall
 * 1)           TASK-PID    CPU#    TIMESTAMP  FUNCTION

You will then run:

cat /sys/kernel/debug/tracing/trace | perl $(KERNEL_SRC_DIR)/scripts/bootgraph.pl > output.svg


 * Note : The clock used for printk.time=1 does not necessarily start running from zero. On such platforms the bootgraph does not work properly as the calculated boottime will be too long. To correct this, two patches modify the script to accept only non-zero start times ( & ).


 * The boot trace can also be analyzed through a VCD file create by the sched_switch tool, for more information see Sched_switch viewer.

Note:  It is possible to trace events during the boot phase with this parameter in the bootargs (more information ) trace_event=[event-list]

Latency Tracers
When the latency-format option is enabled (Tracer option), the trace file gives somewhat more information to see why a latency happened.

Here is typical trace :

# # # # gdm-bina-25671   0d.... 0us : __up_read gdm-bina-25671  0d.... 0us : __up_read gdm-bina-25671  0d.... 0us : trace_hardirqs_on <-__up_read gdm-bina-25671  0d.... 0us!: ret_slow_syscall gdm-bina-25671  0d.... 4517us : vector_swi gdm-bina-25671  0d.... 4517us : trace_hardirqs_on <-vector_swi gdm-bina-25671  0d.... 4517us :
 * 1) tracer: irqsoff
 * 1) irqsoff latency trace v1.1.5 on 2.6.33-dirty
 * 1) latency: 4516 us, #7/7, CPU#0 | (M:server VP:0, KP:0, SP:0 HP:0)
 * 2)    | task: -25671 (uid:0 nice:0 policy:0 rt_prio:0)
 * 3)  => started at: ret_slow_syscall
 * 4)  => ended at:   vector_swi
 * 1)  => started at: ret_slow_syscall
 * 2)  => ended at:   vector_swi
 * 1)                  _--=> CPU#
 * 2)                 / _-=> irqs-off
 * 3)                | / _=> need-resched
 * 4)                || / _---=> hardirq/softirq
 * 5)                ||| / _--=> preempt-depth
 * 6)                |||| /_--=> lock-depth
 * 7)                |||||/     delay
 * 8)  cmd     pid   |||||| time  |   caller

This shows that the current tracer is "irqsoff" tracing the time for which interrupts were disabled. It gives the trace version and the version of the kernel upon which this was executed on (2.6.33). Then it displays the max latency in microsecs (4516 us). The number of trace entries displayed and the total number recorded (both are seven: #7/7).VP, KP, SP, and HP are always zero and are reserved for later use.

The task is the process that was running when the latency occurred. (pid: 25671).

The start and stop (the functions in which the interrupts were disabled and enabled respectively) that caused the latencies:


 * ret_slow_syscall is where the interrupts were disabled.
 * vector_swi is where they were enabled again.

The next lines after the header are the trace itself. The header explains which is which.

Note: If the architecture does not support a way to read the irq flags variable, an 'X' will always be printed here.
 * cmd: The name of the process in the trace.
 * pid: The PID of that process.
 * CPU#: The CPU which the process was running on.
 * irqs-off: 'd' interrupts are disabled. '.' otherwise.
 * need-resched: 'N' task need_resched is set, '.' otherwise.
 * hardirq/softirq:
 * 'H' - hard irq occurred inside a softirq.
 * 'h' - hard irq is running
 * 's' - soft irq is running
 * '.' - normal context.


 * preempt-depth: The level of preempt_disabled

The above is mostly meaningful for kernel developers.


 * time: When the latency-format option is enabled, the trace file output includes a timestamp relative to the start of the trace. This differs from the output when latency-format is disabled, which includes an absolute timestamp.


 * delay: This is just to help catch your eye a bit better. And needs to be fixed to be only relative to the same CPU. The marks are determined by the difference between this current trace and the next trace.
 * '!' - greater than preempt_mark_thresh (default 100)
 * '+' - greater than 1 microsecond
 * ' ' - less than or equal to 1 microsecond.

The rest is the same as the 'trace' file.

For more precise results, you can disabled the function tracing, which can bring a large overhead, by turning off the kernel variable ftrace_enabled.


 * 1) sysctl kernel.ftrace_enabled=1

WARNING Before launching each latency tracer, don't forget to reset the tracing_max latency echo 0 > tracing_max_latency

NOTE: The following tracers are made for measuring latency, so the latency-format option will be enabled automatically.

Irqsoff Latency Tracer
This tracer is activate with CONFIG_IRQSOFF_TRACER

The plugin irqsoff is a way to measure times in the kernel that interrupts are disabled. When a new maximum latency is hit, the tracer saves the trace leading up to that latency point so that every time a new maximum is reached, the old saved trace is discarded and the new trace is saved. An example of trace is visible above

Wakeup Tracer
This tracer is activate by enabling the CONFIG_SCHED_TRACER

This tracer tracks the latency of the highest priority task to be scheduled in, starting from the point it has woken up.

# # # dbus-dae-623     0d.h.. 31us :   623:120:R   + [000]  2237: 94:S sleep dbus-dae-623    0d.h.. 31us+: 0 dbus-dae-623    0..... 61us : schedule dbus-dae-623    0..... 61us :   623:120:R ==> [000]  2237: 94:R sleep
 * 1) echo wakeup > current_tracer
 * 2) chrt -f 5 sleep 2
 * 3) cat trace
 * 4) tracer: wakeup
 * 1) wakeup latency trace v1.1.5 on 2.6.33-dirty
 * 1) latency: 61 us, #4/4, CPU#0 | (M:server VP:0, KP:0, SP:0 HP:0)
 * 2)    | task: -2237 (uid:0 nice:0 policy:1 rt_prio:5)
 * 1)    | task: -2237 (uid:0 nice:0 policy:1 rt_prio:5)
 * 1)                  _--=> CPU#
 * 2)                 / _-=> irqs-off
 * 3)                | / _=> need-resched
 * 4)                || / _---=> hardirq/softirq
 * 5)                ||| / _--=> preempt-depth
 * 6)                |||| /_--=> lock-depth
 * 7)                |||||/     delay
 * 8)  cmd     pid   |||||| time  |   caller

Running this on an idle system, we see that it took 61 microseconds to perform the task switch.

Tracer options
Some options can be added to the tracer to change its output. The list of all options is contain in the ftrace_options file print-parent nosym-offset nosym-addr noverbose noraw nohex nobin noblock nostacktrace trace_printk noftrace_preempt nobranch annotate nouserstacktrace nosym-userobj noprintk-msg-only context-info nolatency-format sleep-time graph-time nofunc_stack_trace
 * 1) cat trace_options

To enable an option, just echo its name in the ftrace_options #            Xorg-15046 [000] 10769.470612: kmem_cache_free <-user_path_at Xorg-15046 [000] 10769.470612: sys_stat64 <-ret_fast_syscall Xorg-15046 [000] 10769.470612: vfs_stat <-sys_stat64 Xorg-15046 [000] 10769.470612: vfs_fstatat <-sys_stat64 Xorg-15046 [000] 10769.470612: user_path_at <-vfs_fstatat Xorg-15046 [000] 10769.470612: getname <-user_path_at In this example the sym-addr option add the adress of the symbol call. To remove an option, just echo its name preceded by no in the same ftrace_options file
 * 1) echo sym-addr > trace_options
 * 2) cat trace | head -10
 * 3) tracer: function
 * 1)           TASK-PID    CPU#    TIMESTAMP  FUNCTION
 * 1) echo nosym-addr > trace_options

<font color="#FF0000">WARNING  the func_stack_trace option is very dangerous and must be use with precaution. Its usage is detailled in the part. <font color="#FF0000">WARNING 

-

Dynamic Ftrace
It is highly recommended to enable CONFIG_DYNAMIC_FTRACE because of this performance enhancement (see How the Function Tracer work ? and Benchmark). In addition, CONFIG_DYNAMIC_FTRACE gives the ability to filter which function should be traced.

Note:  Dynamic FTrace does not work with Function_graph tracer patches.(see Function Graph Tracer)

Enabling Dynamic Ftrace for ARM
Actually Dynamic Ftrace for ARM is not officialy supported but a patch permit to activate this tracer.


 * Download this patch : (kernel patches)
 * move/copy the patch in the kernel folder
 * run command

$ patch -p1 < dynamic_ftrace.patch patching file Makefile patching file arch/arm/Kconfig patching file arch/arm/Kconfig.debug patching file arch/arm/include/asm/ftrace.h patching file arch/arm/kernel/armksyms.c patching file arch/arm/kernel/entry-common.S patching file arch/arm/kernel/ftrace.c patching file kernel/trace/Kconfig patching file scripts/Makefile.build patching file scripts/recordmcount.pl

Using Dynamic Ftrace
Running the function tracer can be overwhelming. The amount of data may be vast, and very hard to get a hold of by the human brain. Ftrace provides a way to limit what functions you see. Two files exist that let you limit what functions are traced:


 * set_ftrace_filter
 * set_ftrace_notrace

When any function is listed in the set_ftrace_filter, only those functions will be traced. This will help the performance of the system when the trace is active. Tracing every function incurs a large overhead, but when using the set_ftrace_filter, only those functions listed in that file will have the NOPs changed to call the tracer. Depending on which functions are being traced, just having a couple of hundred functions enabled is hardly noticeable.

NOTE: The functions listed in set_ftrace_notrace take precedence. That is, if a function is listed in both set_ftrace_notrace and set_ftrace_filter, that function will not be traced.

Selecting filter function
A list of functions that can be added to the filter files is shown in the available_filter_functions file. This file contain almost all functions in the kernel.

do_one_initcall run_init_process init_post name_to_dev_t elf_check_arch arm_elf_read_implies_exec elf_set_personality set_irq_flags show_interrupts machine_halt ....
 * 1) cat available_filter_functions |head

To add a filter, just echo his name in set_function_filter echo do_one_initcall > set_ftrace_filter

But adding one by one all the wanted event can be boring Fortunately, these files also take wildcards; the following glob expressions are valid:


 * value* - Select all functions that begin with value.
 * *value* - Select all functions that contain the text value.
 * *value - Select all functions that end with value.

To remove a filter or more, use the !symbol echo '!do_one_initcall' > set_function_filter

All these commands work also with set_ftrace_notrace

Stop the trace on a specific function
Enabling and disabling recording to the trace file can be done using tracing_on file from debugfs. It can also be controlled accurately through tracing_on and tracing_off functions from the kernel, which require unfortunately to recompile the kernel. In the special case where you just want to switch tracing on/off in a specific kernel module you don't need to rebuild the whole kernel. Just add tracing_on and tracing_off to your module and recompile it. A last method allows stopping the trace at a particular function. This method uses the set_ftrace_filter file.

The format of the command to have the function trace enable or disable the ring buffer is as follows:

function:command[:count]

This will execute the command at the start of the function. The command is either traceon or traceoff, and an optional count can be added to have the command only execute a given number of times. If the count is left off (including the leading colon) then the command will be executed every time the function is called.

__do_softirq:traceoff:unlimited
 * 1) echo '__do_softirq:traceoff' > set_ftrace_filter
 * 2) cat set_ftrace_filter
 * 3) all functions enabled ####

Notice that functions with commands do not affect the general filters. Even though a command has been added to __do_softirq, the filter still allowed all functions to be traced. Commands and filter functions are separate and do not affect each other.

What calls a specific function?
Sometimes it is useful to know what is calling a particular function. The immediate predecessor is helpful, but an entire backtrace is even better. The function tracer contains an option that will create a backtrace in the ring buffer for every function that is called by the tracer.

To use the function tracer backtrace feature, it is imperative that the functions being called are limited by the function filters. The option to enable the function backtracing is unique to the function tracer and activating it can only be done when the function tracer is enabled. This means you must first enable the function tracer before you have access to the option:

=> ftrace_call events/0-5    [000] 21472.572144: wakeup_softirqd <-__tasklet_schedule events/0-5    [000] 21472.572174: => ftrace_call events/0-5    [000] 21472.987244: wakeup_softirqd <-__tasklet_schedule events/0-5    [000] 21472.987244: => ftrace_call events/0-5    [000] 21473.025970: wakeup_softirqd <-__tasklet_schedule events/0-5    [000] 21473.026001: => ftrace_call
 * 1) echo wakeup_softirqd > set_ftrace_filter
 * 2) echo function > current_tracer
 * 3) echo 1 > options/func_stack_trace
 * 4) cat trace | tail
 * 1) echo 0 > options/func_stack_trace
 * 2) echo > set_ftrace_filter

<font color="#FF0000">Warning  cat the set_ftrace_filter before enabling the func_stack_trace option to ensure that the filter was enabled. Disable the options/func_stack_trace before disabling the filter.

<font color="#FF0000">Warning  Also note that the option is non-volatile, that is, even if you enable another tracer plugin in current_tracer, the option will still be enabled if you re-enable the function tracer. ---

Trace Event
Using the different previous plugin tracers is not always very practical. Some can bring a large overhead and others do not give enough information to see what happened. Luckily, there is the event tracer. The event tracing is not a plugin. When events are enabled, they will be recorded in any plugin, including the special nop plugin.

Via the 'set_event' interface
The events which are available for tracing can be found in the file "available_events". skb:kfree_skb skb:skb_copy_datagram_iovec block:block_remap block:block_split block:block_plug block:block_bio_complete block:block_bio_bounce block:block_rq_remap block:block_unplug_io block:block_unplug_timer
 * 1) cat available_events

To enable a particular event, such as 'sched_wakeup', simply echo it to set_event. For example:


 * 1) echo sched_wakeup >> /sys/kernel/debug/tracing/set_event

Note: '>>' is necessary, otherwise it will firstly disable all the events.

To disable an event, echo the event name to the set_event file prefixed with an exclamation point:


 * 1) echo '!sched_wakeup' >> set_event

To disable all events, echo an empty line to the set_event file:


 * 1) echo > /sys/kernel/debug/tracing/set_event

To enable all events, echo '*:*' or '*:' to the set_event file:


 * 1) echo *:* > /sys/kernel/debug/tracing/set_event

The events are organized into subsystems, such as ext4, irq, sched, etc., and a full event name looks like this: :. The subsystem name is optional, but it is displayed in the available_events file. All of the events in a subsystem can be specified via the syntax " :*"; for example, to enable all irq events, you can use the command:


 * 1) echo 'irq:*' > /sys/kernel/debug/tracing/set_event

Via the 'enable' toggle
The events available are also listed in /sys/kernel/debug/tracing/events/ hierarchy of directories.

To enable event 'sched_wakeup':


 * 1) echo 1 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable

To disable it:


 * 1) echo 0 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable

To enable all events in sched subsystem:


 * 1) echo 1 > /sys/kernel/debug/tracing/events/sched/enable

To enable all events:


 * 1) echo 1 > /sys/kernel/debug/tracing/events/enable

When reading one of these enable files, there are four results:


 * 0 - all events this file affects are disabled
 * 1 - all events this file affects are enabled
 * X - there is a mixture of events enabled and disabled
 * ? - this file does not affect any event

Boot option
In order to facilitate early boot debugging, use boot option:

trace_event=[event-list]

The format of this boot option is the same as the 'set_event' interface

Example
# #            bash-3871  [000] 20480.632751: sched_wakeup: comm=bash pid=3871 prio=120 success=0 target_cpu=000 bash-3871 [000] 20480.632813: sched_stat_runtime: comm=bash pid=3871 runtime=3295899 [ns] vruntime=14957505137978 [ns] bash-3871 [000] 20480.632843: sched_switch: prev_comm=bash prev_pid=3871 prev_prio=120 prev_state=S ==>  next_comm=gdm-binary next_pid=25515 next_prio=120 gdm-binary-25515 [000] 20480.636444: sched_stat_runtime: comm=gdm-binary pid=25515 runtime=3631591 [ns] vruntime=14957507973670 [ns] gdm-binary-25515 [000] 20480.636505: sched_stat_runtime: comm=gdm-binary pid=25515 runtime=61035 [ns] vruntime=14957508034705 [ns] gdm-binary-25515 [000] 20480.636536: sched_wakeup: comm=events/0 pid=5 prio=120 success=1 target_cpu=000
 * 1) echo nop > current_tracer
 * 2) cat trace
 * 3) tracer: nop
 * 1)           TASK-PID    CPU#    TIMESTAMP  FUNCTION
 * 2) echo sched:* > set_event
 * 3) cat trace | head
 * 4) tracer: nop
 * 1) tracer: nop
 * 1)           TASK-PID    CPU#    TIMESTAMP  FUNCTION

More Informations
For further detail about event tracing, read the file Documents/Trace/events.txt in the kernel directory

Using FTrace with Trace-cmd
Using echo and cat to set up Ftrace can be boring. Trace-cmd allow to setup and use Ftrace with very few command. It also provide a tiny GUI to analyse trace data.

Install Trace-cmd
To get the code of Trace-cmd, just clone the following GIT

git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/trace-cmd.git

You need to have "asciidoc" and "swig" packages installed before building Trace-cmd.

Note: the following instructions has to be done in this order, if you want to install trace-cmd on your Zoom2 and the GUI on your computer

Trace-cmd on the OMAP Zoom2
$ make clean $ make LDFLAGS=-static CC=$HOME/bin/arm-2009q3/bin/arm-none-linux-gnueabi-gcc trace-cmd $ cp trace-cmd /tmp/mmc2
 * you have to download and apply this patch [[Media:no_cpu_correction.tar.gz]], which correct a bug when Trace-cmd try to find the number of cpus on the Zoom2 board.
 * Once this patch apply, you can compile Trace-cmd for ARM architecture and copy it into your filesystem.

Install the Trace-cmd GUI
$ make clean $ make install_gui

The last command install the GUI in the $HOME/bin directory.

Tracing with Trace-cmd
Trace-cmd is very easy to use with simple command. For example, the following one will enable all events and trace them during a ls command.


 * 1) trace-cmd record -e all ls

If we had to use Ftrace debugfs interface, it would be a little more difficult # ls And in this case, trace-cmd is more precise because it trace only during the ls command execution.
 * 1) echo 1 > events/enable
 * 2) echo 1 > tracing_on
 * 1) echo 0 > tracing_on

At the end of the trace, Trace-cmd create a pre-formated file (trace.dat by default, can be change with the -o option) which contain all information of the trace

If you want to know all the parameter possible with trace-cmd, type trace-cmd without any parameters # trace-cmd trace-cmd version 0.7 usage: trace-cmd record [-v][-e event [-f filter]][-p plugin][-F][-d][-o file] \ [-s usecs][-O option ][-l func][-g func][-n func] [-P pid][-N host:port][-t][-b size][command ...] -e run command with event enabled -f filter for previous -e event -p run command with plugin enabled -F filter only on the given process -P trace the given pid like -F for the command -l filter function name -g set graph function -n do not trace function -v will negate all -e after it (disable those events) -d disable function tracer when running -o data output file [default trace.dat] -O option to enable (or disable) -s sleep interval between recording (in usecs) [default: 1000] -N host:port to connect to (see listen) -t used with -N, forces use of tcp in live trace -b change kernel buffersize (in kilobytes per CPU) ...........

Analyzing trace data with GUI
Note : Works only with trace generated by Trace-cmd tools

Trace-cmd provides a tiny GUI call kernelshark. This GUI helps to understand what happens during a trace. To launch kernelshark, just type is name in a terminal. If a trace.dat file is present in the current directory, kernelshark will automatically load it. Otherwise, you can load the file of your choice with "Load info" in the File menu. This GUI permits to apply different filter to the data, watch the process of each task and the process per cpu.

How to use kernelshark

 * Type kernelshark in terminal to launch it
 * Load trace data with the File/Load Info menu
 * To zoom, just click and drag from left to right on the graph
 * To unZoom, just click and drag from right to left
 * To plot a precise task, select it in the Plots/Tasks menu
 * You can apply some filter on the list of event or on the graph with the Filter menu (the filters are only apply on the CPUs graph)


 * NOTE: Presentation of trace-cmd at the Linux Foundation summit 2010 here

--

Precision of Ftrace
Actually on OMAP, Ftrace use a 32K Timer with a resolution of only 31 microseconds.This did not allow trace event times to be distinguished accurately. Luckily, there was another clock source available that had higher resolution, the MPU_TIMER. This timer provides more intra-tick resolution than the 32KHz timer, but consumes more power. Tim Bird work on patch to use this timer and will submit it soon.

Benchmark
As every measuring tool, Ftrace brings an overhead for every function call by the kernel. To get a good idea of the overhead, three benchmark was tested
 * Hackbench, this test is a benchmark for measuring the performance, overhead, and scalability of the Linux scheduler. Created by Rusty Russell, it uses client and server processes grouped to send and receive data in order to simulate the connections established for a chat room. Each client sends a message to each server in the group.
 * Interbench, It is designed to measure the effect of changes in Linux kernel design or system configuration changes such as cpu, I/O scheduler and filesystem changes and options.
 * Duration for uncompress a large tar.gz file.

Hackbench
According to this graphic, we can see that the Function tracer create a large overhead, but this graph also show up the importance of Dynamic FTrace. Thus, we can see that this option bring an improvement of almost 45% on the performance when tracing the Function Tracer. Moreover, when there is no trace, dynamic Ftrace does not bring an overhead contrary to Ftrace which has an overhead of 47%. Obviously, when using a function filter on Ftrace, the overhead is minimal and does not impact the system

Interbench
This test bench will measuring the performance of the system for different load. The results show up that for light task (playing 5ms audio intervals, which use only 5% CPU), the impact is only on the latency of the function. But for more heavy task, like gaming which use all CPU it can get,the impact is more important and particularly if an heavy run in background (here reading a file of the size of physical RAM) and major part of the function are not executed, about 60% is not executed compared to an execution without a trace, this will be traduce by bug and freeze for the user, futhermore the latency are still more important. Dynamic Function Tracer obtains the same result, and his overhead is still important for the heavy task.

Uncompressing a large .tar.gz file
This test consists to decompressing a large tar.gz file (an ubuntu minimal filesystem ~90MB) and measuring the time elapse to do it. According to the graph, we can see the large overhead that bring the function tracer (with or without the Dynamic FTrace option). The other tracers have almost the same time and the deviation are not enough important to be interpreted.

In conclusion, with this test bench, we can say that the overhead of Ftrace is quite important, particularly for the function tracer. We also see that Ftrace brings an overhead even if there is no trace but this problem can be solve by enabling the Dynamic Ftrace option. This options also bring an improvement for the function tracer overhead even if it is still important without a function filter.

--

For more information

 * A look at Ftrace
 * Ftrace Function Graph ARM
 * The state of linux tracing

patches for gcc 4.4.0
Note recent ARM toolchains instrument the code with calls to '__gnu_mcount_nc' instead of 'mcount'. If you get a compiler warning about missing the symbol '__gnu_mcount_nc', you should apply the patch mentioned in this e-mail (http://marc.info/?l=linux-arm-kernel&m=124946219616111&w=2)

Kernel configuration & Re-build
The kernel should be reconfigured and rebuilt to enable the support of FTrace

$ make CROSS_COMPILE=arm-none-linux-gnueabi- ARCH=arm distclean $ make CROSS_COMPILE=arm-none-linux-gnueabi- ARCH=arm blaze_defconfig $ make CROSS_COMPILE=arm-none-linux-gnueabi- ARCH=arm xconfig
 * Clone the choosen kernel and point to the appropriate commit id. Choose one DB and point to its kernel commit id.
 * Run the following command from the kernel folder

Kernel Hacking -> Tracers -> FUNCTION_TRACER Kernel Hacking -> Tracers -> Trace max stack Kernel Hacking -> Tracers -> enable/disable ftrace tracepoints dynamically
 * Set the following in xconfig

Note Kernel Function Graph Tracer --> doesn't work on this Kernel tag [based on 3.0.8]. So please ensure that you have disabled it.

tree. The patches can be downloaded from [[Media:Function_tracer_support.gz]]. Apply these patches [only if you are based of kernel p-android-omap-3.0 tree] in your kernel.
 * p-android-omap-3.0 is based on Kernel 3.0.8. But ftrace doesn't work as it is on this kernel. We need to take 2 patches from the stable Kernel

Note:Follow Module incompatibility rules when you modify/build kernel locally.

$ make CROSS_COMPILE=arm-none-linux-gnueabi- ARCH=arm uImage (For further information about the compilation of the kernel see Linux OMAP Kernel Project)
 * Compile the kernel

Dependent Modules Rebuild
With Android File system, the system also loads [via insmod] the WLAN and GPS modules. We should build these modules against the kernel for which ftrace has enabled. Without which the system will have continuous reboots!! Please Build WLAN/GFX modules against your kernel, per these instructions: Building Modules

git clone git://git.omapzoom.org/platform/hardware/ti/wlan.git cd wlan git checkout --track -b p-ics-mr1 origin/p-ics-mr1 cd mac80211/compat export KLIB=/path/to/kernel export KLIB_BUILD=/path/to/kernel export ARCH=arm make
 * WLAN

This results WLAN modules. Replace these modules with Android file system DB WLAN modules.
 * compat/compat.ko --> system/lib/modules/
 * net/wireless/cfg80211.ko --> system/lib/modules/
 * net/mac80211/mac80211.ko --> system/lib/modules/
 * drivers/net/wireless/ti/wl12xx/wl12xx.ko --> system/lib/modules/
 * drivers/net/wireless/ti/wl12xx/wl12xx_sdio.ko --> system/lib/modules/

git clone git://git.omapzoom.org/device/ti/proprietary-open.git cd proprietary-open git checkout --track -b p-ics-mr1 origin/p-ics-mr1
 * GFX

Untar the sgx.tgz tarball located in proprietary-open/omap4. You should now have the directory proprietary-open/omap4/sgx. Follow instructions at device/ti/proprietary-open/omap4/sgx/README. Specifically:

cd omap4/sgx cd src/eurasia_km/eurasiacon/build/linux2/omap4430_android export KERNELDIR=path_to_kernel

If you are building for OMAP4430/4460, use the make command:

make ARCH=arm CROSS_COMPILE=arm-none-linux-gnueabi- TARGET_PRODUCT="blaze_tablet" BUILD=release TARGET_SGX=540 PLATFORM_VERSION=4.0

If you are building for OMAP4470, use the make command:

make ARCH=arm CROSS_COMPILE=arm-none-linux-gnueabi- TARGET_PRODUCT="blaze_tablet" BUILD=release TARGET_SGX=544sc PLATFORM_VERSION=4.0

This results GFX modules. Replace these modules with Android file system DB GFX modules.

OMAP 4430/60
 * target/kbuild/omaplfb_sgx540_120.ko --> system/lib/modules/
 * target/kbuild/pvrsrvkm_sgx540_120.ko --> system/lib/modules/

OMAP 4470
 * target/kbuild/omaplfb_sgx544_120.ko --> system/lib/modules/
 * target/kbuild/pvrsrvkm_sgx544_112.ko --> system/lib/modules/


 * GPS

This is proprietary module. The source code is available in Clear case. Please contact DB team for location and build instructions. If one can not able to build GPS modules against your kernel, you can rename the gps module in system/lib/modules. With this GPS module will not get loaded.

Using Ftrace
If debugfs is not mounted please follow below instruction to mount it. Ftrace has its control ﬁles in the debugfs system.This is usually mounted in /sys/kernel/debug. If it is not already mounted, then you can mount it yourself with: Many Developpers prefer to have this directory at the root of the system That creates a /debug/tracing subdirectory which is used to control ftrace and for getting output from the tool.
 * 1) mount -t debugfs nodev /sys/kernel/debug
 * 1) mkdir /mnt/debug
 * 2) mount -t debugfs nodev /mnt/debugfs

Note: all the command of this article will be pass in the /debug/tracing/ directory or d/tracing directory.

Choosing a tracer
To find out which tracers are available, simply cat the available_tracers file in the tracing directory: function sched_switch nop To enable, for example, the function tracer, just echo "function" into the current_tracer file. function Note: Ftrace can handle only one tracer at the same time
 * 1) cat available_tracers
 * 1) echo function > current_tracer
 * 2) cat current_tracer

Using a tracer
At this point Ftrace is ready to trace the function, to start it just echo 1 in the tracing_on file To stop the trace, just echo 0 in the same file: The trace is contained in the trace file, here is an example of the output from a function trace. The header helps to decode the various fields in the trace. #    avahi-daemon-630   [000]  9507.400848: vfs_read <-sys_read avahi-daemon-630  [000]  9507.400848: rw_verify_area <-vfs_read avahi-daemon-630  [000]  9507.400848: do_sync_read <-vfs_read avahi-daemon-630  [000]  9507.400848: sock_aio_read <-do_sync_read avahi-daemon-630  [000]  9507.400848: alloc_sock_iocb <-sock_aio_read
 * 1) echo 1 > tracing_enabled
 * 1) echo 0 > tracing_enabled
 * 1) cat trace | head -10
 * 2) tracer: function
 * 1)           TASK-PID    CPU#    TIMESTAMP  FUNCTION

Function tracer
This tracer is activated by enabling CONFIG_FUNCTION_TRACER in the kernel configuration. To work, it needs the kernel variable ftrace_enabled to be turned on, otherwise this tracer is a nop.

# echo function > current_tracer # cat current_tracer function

For Ex: I want to trace prcm_interrupt_handler function # echo 'prcm_interrupt*' > set_ftrace_filter # cat set_ftrace_filter prcm_interrupt_handler # echo 1 > tracing_enabled run the use case # echo 0 > tracing_enabled [tracing]# cat trace | head -10 # tracer: function #    #           TASK-PID    CPU#    TIMESTAMP  FUNCTION #             | |       |          |         |          -0     [000]   858.965424: prcm_interrupt_handler <-handle_irq_event_percpu -0    [000]   858.965759: prcm_interrupt_handler <-handle_irq_event_percpu -0    [000]   861.355255: prcm_interrupt_handler <-handle_irq_event_percpu -0    [000]   861.355591: prcm_interrupt_handler <-handle_irq_event_percpu UEventObserver-210  [000]   861.724243: prcm_interrupt_handler <-handle_irq_event_percpu -0    [000]   861.725037: prcm_interrupt_handler <-handle_irq_event_percpu <...>-931  [000]   861.730316: prcm_interrupt_handler <-handle_irq_event_percpu <...>-931  [000]   861.730682: prcm_interrupt_handler <-handle_irq_event_percpu

Note: trace file might already contain some boot up function traces. You can delete contents using below command before actual function trace.

# echo 0 > trace