Introduction to Profiling
For users interested in tuning the performance of their applications, the STLinux
distribution includes two profilers - gprof and OProfile.
The standard GNU profiler, gprof, has two key limitations for embedded Linux
development:
- It will only profile a single user mode application.
- It will only profile an entire run of that application, and requires
it to exit.
Many embedded applications are never intended to exit!
OProfile addresses both of these issues. It
is built into the kernel, and uses timer interrupts to profile the kernel and all
user mode processes. The resulting profile can show the time spent in each function,
process, thread, or binary. It can be started and stopped at any time and so does
not require applications to complete.
Using OProfile
Before OProfile can be used, it must be configured into the kernel:
Profiling support --->
Profiling support (EXPERIMENTAL) --->
OProfile system profiling (EXPERIMENTAL)
Of course, to get a symbol level profile of the kernel, it must be built with debug info:
Kernel hacking --->
Kernel debugging --->
Compile the kernel with debug info
The OProfile support can either be built as a kernel module, or statically linked into
the kernel. The rest of this document assumes that it has been statically linked.
To profile the time spent in individual kernel functions, a copy of
the kernel must be placed in the target filesystem. This can be done with
scp.
host% scp vmlinux root@targetboard:/root2
OProfile is configured and driven using command-line utilities on the target
system. The two most important of these are:
- opcontrol - to configure, start and stop profiling.
- opreport - to output the current profile in a variety of forms.
To start profiling the system, use the following commands. They could be put in
a script for convenience.
target% rm -rf /var/lib/oprofile/target% opcontrol --inittarget% opcontrol --setup --separate=all --vmlinux=/root/vmlinuxtarget% opcontrol --start
The first command deletes any samples from previous runs. If the run was stopped
unexpectedly (for example by resetting the board), then these samples will prevent
any new ones being collected.
opcontrol --init creates the directory structure under
/var and loads the OProfile kernel module where appropriate.
In the opcontrol --setup line, the kernel image is specified (this
can be omitted if you can do not want to profile the kernel at symbol level), and
we specify that samples should be separated out by process, thread, binary image
and cpu. By default samples are only separated by binary image.
opcontrol --start begins collecting samples.
When you have run the application(s) to be profiled, two commands are available
to write out the samples to /var:
opcontrol --dump
opcontrol --stop
The only difference is that after a dump, OProfile will continue to collect samples.
opcontrol --stop stops profiling altogether.
Displaying the results
Once the samples have been collected, they can be viewed with opreport:
target% opreport --merge=all
CPU: CPU with timer interrupt, speed 0 MHz (estimated)
Profiling through timer interrupt
TIMER:0|
samples| %|
------------------
96694.7988 vmlinux
232.2571 bash
181.7664 libc-2.3.3.so60.5888 ld-2.3.3.so20.1963 ls
20.1963 sshd
10.0981 gawk
10.0981 grep
This shows that over the profiled period, 94.8% of the time was spent in the kernel,
2.3% of the time in bash, 1.8% of the time in glibc...
The option --merge is used to undo the separation of samples by process
and thread. To break the profile down by symbol, use the option -l:
target% opreport --merge=all -l
CPU: CPU with timer interrupt, speed 0 MHz (estimated)
Profiling through timer interrupt
samples % app name symbol name
1566050.3764 vmlinux cpu_idle
1495448.1053 vmlinux default_idle
210.0676 vmlinux __raw_readsl
110.0354 libstdc++.so.6.0.3 anonymous symbol from section .plt90.0290 ld-2.3.3.so do_lookup_x
80.0257 sshd rijndael_encrypt
<snip>
This shows that just over 50% of all samples taken were in the kernel function
cpu_idle.
The output can be restricted to a single binary image like this:
target% opreport --merge=all -l image:/`which sshd`
CPU: CPU with timer interrupt, speed 0 MHz (estimated)
Profiling through timer interrupt
samples % app name symbol name
137.2626 sshd rijndael_encrypt
73.9106 sshd anonymous symbol from section .plt52.7933 libcrypto.so.0.9.6 md5_block_host_order
52.7933 sshd rijndael_decrypt
42.2346 libc-2.3.3.so memcpy
<snip>
This shows how long sshd spent in each function it executed, including
shared library and kernel functions.
If the --merge option is removed, then the output is broken down by
process and thread:
target% opreport -l image:/root/thread_test
CPU: CPU with timer interrupt, speed 0 MHz (estimated)
Profiling through timer interrupt
Processes with a thread ID of 1112
Processes with a thread ID of 1139
Processes with a thread ID of 1154
samples % samples % samples % symbol name
1100.0000000 _IO_new_file_overflow
001100.00000 __udivsi3_i4
00001100.000 clone
<snip>
In this display, each thread has two columns - the number of samples, and the
percentage this represents of total samples. In the example above, one only sample
was taken
in each thread - thread 1112 was sampled once in _IO_new_file_overflow,
thread 1139 was sampled once in __udivsi3_i4 and so on.
Many more permutations of options are possible. As the last example shows, once
the profile is broken down by thread the output can quickly become very complex
and difficult to read! Care must be taken to specify the most appropriate
report format for the information you are trying to extract.
One final option that may be of interest:
target% opgprof `which sshd`
This writes out the profile of a single application in gprof format.
It can then be viewed with gprof, or more importantly with a GUI that
recognises that format such as kprof.
Much more information on OProfile can be found on its
project homepage, and in the manpages
for the various command-line tools.
Other useful tools
OProfile is sample-based. To get an totally accurate breakdown of the time spent in each process, thread, or interrupt handler, consider using KPTrace. KPTrace also provides detailed figures on the number of times interrupts fired, the number of context switches, and so on.
Both gprof and OProfile can be used from within STWorkbench, which provides a sophisticated graphical interface on the profile results. KPTrace also provides an excellent graphical trace viewer inside STWorkbench.