perf began as a tool for using the performance counters subsystem in Linux, and has had various enhancements to add tracing capabilities.
Performance counters are CPU hardware registers that count hardware events such as instructions executed, cache-misses suffered, or branches mispredicted. They form a basis for profiling applications to trace dynamic control flow and identify hotspots. perf provides rich generalized abstractions over hardware specific capabilities. Among others, it provides per task, per CPU and per-workload counters, sampling on top of these and source code event annotation.
Tracepoints are instrumentation points placed at logical locations in code, such as for system calls, TCP/IP events, file system operations, etc. These have negligible overhead when not in use, and can be enabled by the perf command to collect information including timestamps and stack traces. perf can also dynamically create tracepoints using the kprobes and uprobes frameworks, for kernel and userspace dynamic tracing. The possibilities with these are endless.
The userspace perf command present a simple to use interface with commands like:
- perf stat: obtain event counts
- perf record: record events for later reporting
- perf report: break down events by process, function, etc.
- perf annotate: annotate assembly or source code with event counts
- perf top: see live event count
- perf bench: run different kernel microbenchmarks
Install Perf
dnf install perf -y
[root@localhost ~]# perf --help
usage: perf [--version] [--help] [OPTIONS] COMMAND [ARGS]
The most commonly used perf commands are:
annotate Read perf.data (created by perf record) and display annotated code
archive Create archive with object files with build-ids found in perf.data file
bench General framework for benchmark suites
buildid-cache Manage build-id cache.
buildid-list List the buildids in a perf.data file
c2c Shared Data C2C/HITM Analyzer.
config Get and set variables in a configuration file.
data Data file related processing
diff Read perf.data files and display the differential profile
evlist List the event names in a perf.data file
ftrace simple wrapper for kernel's ftrace functionality
inject Filter to augment the events stream with additional information
kallsyms Searches running kernel for symbols
kmem Tool to trace/measure kernel memory properties
kvm Tool to trace/measure kvm guest os
list List all symbolic event types
lock Analyze lock events
mem Profile memory accesses
record Run a command and record its profile into perf.data
report Read perf.data (created by perf record) and display the profile
sched Tool to trace/measure scheduler properties (latencies)
script Read perf.data (created by perf record) and display trace output
stat Run a command and gather performance counter statistics
test Runs sanity tests.
timechart Tool to visualize total system behavior during a workload
top System profiling tool.
version display the version of perf binary
probe Define new dynamic tracepoints
trace strace inspired tool
See 'perf help COMMAND' for more information on a specific command.
Perf Stat
perf list stat
[root@localhost ~]# perf list stat
List of pre-defined events (to be used in -e):
jbd2:jbd2_checkpoint_stats [Tracepoint event]
jbd2:jbd2_handle_stats [Tracepoint event]
jbd2:jbd2_run_stats [Tracepoint event]
power:pstate_sample [Tracepoint event]
rtc:rtc_irq_set_state [Tracepoint event]
sched:sched_stat_blocked [Tracepoint event]
sched:sched_stat_iowait [Tracepoint event]
sched:sched_stat_runtime [Tracepoint event]
sched:sched_stat_sleep [Tracepoint event]
sched:sched_stat_wait [Tracepoint event]
sock:inet_sock_set_state [Tracepoint event]
sunrpc:rpc_bind_status [Tracepoint event]
sunrpc:rpc_call_status [Tracepoint event]
sunrpc:rpc_connect_status [Tracepoint event]
sunrpc:rpc_socket_state_change [Tracepoint event]
sunrpc:rpc_stats_latency [Tracepoint event]
sunrpc:svc_stats_latency [Tracepoint event]
syscalls:sys_enter_fstatfs [Tracepoint event]
syscalls:sys_enter_newfstat [Tracepoint event]
syscalls:sys_enter_newfstatat [Tracepoint event]
syscalls:sys_enter_newlstat [Tracepoint event]
syscalls:sys_enter_newstat [Tracepoint event]
syscalls:sys_enter_statfs [Tracepoint event]
syscalls:sys_enter_statx [Tracepoint event]
syscalls:sys_enter_ustat [Tracepoint event]
syscalls:sys_exit_fstatfs [Tracepoint event]
syscalls:sys_exit_newfstat [Tracepoint event]
syscalls:sys_exit_newfstatat [Tracepoint event]
syscalls:sys_exit_newlstat [Tracepoint event]
syscalls:sys_exit_newstat [Tracepoint event]
syscalls:sys_exit_statfs [Tracepoint event]
syscalls:sys_exit_statx [Tracepoint event]
syscalls:sys_exit_ustat [Tracepoint event]
timer:itimer_state [Tracepoint event]
wbt:wbt_stat [Tracepoint event]
writeback:global_dirty_state [Tracepoint event]
x86_fpu:x86_fpu_activate_state [Tracepoint event]
x86_fpu:x86_fpu_init_state [Tracepoint event]
x86_fpu:x86_fpu_xstate_check_failed [Tracepoint event]
xfs:xfs_attr_fillstate [Tracepoint event]
xfs:xfs_attr_refillstate [Tracepoint event]
xfs:xfs_rmap_convert_state [Tracepoint event]
xhci-hcd:xhci_get_port_status [Tracepoint event]
xhci-hcd:xhci_handle_port_status [Tracepoint event]
xhci-hcd:xhci_hub_status_data [Tracepoint event]
Metric Groups:
DSB:
DSB_Coverage
[Fraction of Uops delivered by the DSB (aka Decoded Icache; or Uop Cache)]
FLOPS:
GFLOPs
[Giga Floating Point Operations Per Second]
Frontend:
If you want to know what happens on system when we run free -m command, use the following command:
perf stat free -m
[root@localhost ~]# perf stat free -m
total used free shared buff/cache available
Mem: 821 362 93 1 365 323
Swap: 2219 11 2208
Performance counter stats for 'free -m':
1.82 msec task-clock # 0.544 CPUs utilized
1 context-switches # 0.550 K/sec
0 cpu-migrations # 0.000 K/sec
187 page-faults # 0.103 M/sec
<not supported> cycles
<not supported> instructions
<not supported> branches
<not supported> branch-misses
0.003339558 seconds time elapsed
0.000981000 seconds user
0.001039000 seconds sys
Perf Top
perf-top is a system profiling tool used to generates and displays a performance counter profile in real-time.
Samples: 186 of event 'cpu-clock', 4000 Hz, Event count (approx.): 46500000 lost: 0/0 drop: 0/0
Overhead Shared Object Symbol
9.14% [kernel] [k] memcpy
6.99% [kernel] [k] module_get_kallsym
6.99% perf [.] rb_next
6.45% [kernel] [k] number
5.38% libc-2.28.so [.] __strcmp_avx2
5.38% perf [.] __symbols__insert
4.84% libc-2.28.so [.] __GI_____strtoull_l_internal
4.30% [kernel] [k] kallsyms_expand_symbol.constprop.1
4.30% [kernel] [k] vsnprintf
3.23% libc-2.28.so [.] _IO_feof
2.15% [kernel] [k] format_decode
2.15% [kernel] [k] s_next
2.15% libc-2.28.so [.] _IO_getdelim
2.15% perf [.] kallsyms__parse
1.61% [kernel] [k] __do_page_fault
1.61% [kernel] [k] pointer_string
1.61% [kernel] [k] s_show
1.61% [kernel] [k] string
1.61% libc-2.28.so [.] __mprotect
1.61% libc-2.28.so [.] __strchr_avx2
1.08% [kernel] [k] __handle_mm_fault
1.08% [kernel] [k] update_iter
1.08% libpthread-2.28.so [.] __pthread_mutex_init
1.08% perf [.] __dso__load_kallsyms
1.08% perf [.] rb_erase
1.08% perf [.] 0x0000000000299b01
0.54% [kernel] [k] __audit_syscall_entry
0.54% [kernel] [k] __audit_syscall_exit
0.54% [kernel] [k] __fsnotify_parent
0.54% [kernel] [k] __x86_indirect_thunk_rax
0.54% [kernel] [k] do_anonymous_page
0.54% [kernel] [k] do_idle
0.54% [kernel] [k] down_write_killable
0.54% [kernel] [k] e1000_watchdog
0.54% [kernel] [k] e1000_xmit_frame
0.54% [kernel] [k] mutex_unlock
0.54% [kernel] [k] perf_event_mmap
0.54% [kernel] [k] pointer
0.54% [kernel] [k] proc_reg_read
0.54% [kernel] [k] seq_printf
0.54% [kernel] [k] seq_read
0.54% [kernel] [k] strlcpy
0.54% libc-2.28.so [.] __GI_strtoul
0.54% libc-2.28.so [.] __ctype_tolower_loc
0.54% libc-2.28.so [.] __memchr_avx2
0.54% libc-2.28.so [.] __memmove_avx_unaligned
0.54% libc-2.28.so [.] __strspn_sse42
0.54% libc-2.28.so [.] read
0.54% libc-2.28.so [.] sysmalloc
0.54% perf [.] eprintf
0.54% perf [.] kallsyms__is_function
0.54% perf [.] maps__find
0.54% perf [.] rb_insert_color
0.54% perf [.] 0x000000000029b6d5
0.54% perf [.] 0x000000000029b6e5
0.54% perf [.] 0x000000000029b730
0.54% perf [.] 0x000000000042a890
0.54% perf [.] 0x000000000042b979
0.54% perf [.] 0x000000000042c162
0.54% perf [.] 0x0000000000436125
You can also use -e option to display cpu-clock related events:
perf top -e cpu-clock
Samples: 192 of event 'cpu-clock', 4000 Hz, Event count (approx.): 48000000 lost: 0/0 drop: 0/0
Overhead Shared Object Symbol
9.38% [kernel] [k] memcpy
6.25% [kernel] [k] vsnprintf
5.73% perf [.] rb_next
5.21% [kernel] [k] kallsyms_expand_symbol.constprop.1
5.21% [kernel] [k] module_get_kallsym
5.21% libc-2.28.so [.] __strcmp_avx2
4.69% perf [.] __symbols__insert
3.65% [kernel] [k] format_decode
3.65% [kernel] [k] number
3.65% libc-2.28.so [.] _IO_feof
2.60% [kernel] [k] string
2.60% libc-2.28.so [.] __GI_____strtoull_l_internal
2.08% libc-2.28.so [.] __strchr_avx2
1.56% [kernel] [k] update_iter
1.56% libc-2.28.so [.] _IO_getdelim
1.56% libc-2.28.so [.] __memmove_avx_unaligned
1.56% perf [.] hex2u64
1.04% [kernel] [k] clear_page_rep
1.04% [kernel] [k] copy_user_generic_string
1.04% [kernel] [k] s_next
1.04% [kernel] [k] seq_read
1.04% libc-2.28.so [.] __memchr_avx2
1.04% libc-2.28.so [.] __mprotect
1.04% libc-2.28.so [.] cfree@GLIBC_2.2.5
1.04% perf [.] rb_erase
1.04% perf [.] symbol__new
1.04% perf [.] 0x00000000002715d5
1.04% perf [.] 0x0000000000299b01
0.52% [e1000] [k] e1000_xmit_frame
0.52% [kernel] [k] __audit_syscall_exit
0.52% [kernel] [k] __virt_addr_valid
0.52% [kernel] [k] __x86_indirect_thunk_rax
0.52% [kernel] [k] alloc_pages_vma
0.52% [kernel] [k] cmp_name
0.52% [kernel] [k] e1000_watchdog
0.52% [kernel] [k] pointer
0.52% [kernel] [k] pointer_string
0.52% [kernel] [k] proc_reg_read
0.52% [kernel] [k] s_show
0.52% [kernel] [k] seq_printf
0.52% [kernel] [k] strlen
0.52% [kernel] [k] up_write
0.52% [kernel] [k] widen_string
0.52% libc-2.28.so [.] __ctype_toupper_loc
0.52% libc-2.28.so [.] __libc_calloc
0.52% libc-2.28.so [.] __memcmp_avx2_movbe
0.52% libc-2.28.so [.] __memmove_avx_unaligned_erms
0.52% libc-2.28.so [.] __memset_avx2_unaligned
0.52% libc-2.28.so [.] __memset_avx2_unaligned_erms
0.52% libc-2.28.so [.] __strrchr_avx2
0.52% libc-2.28.so [.] _int_malloc
0.52% libc-2.28.so [.] malloc
0.52% libc-2.28.so [.] sysmalloc
0.52% liblzma.so.5.2.4 [.] 0x0000000000014b9b
0.52% liblzma.so.5.2.4 [.] 0x0000000000014bc2
0.52% liblzma.so.5.2.4 [.] 0x0000000000014dff
0.52% liblzma.so.5.2.4 [.] 0x000000000001523a
0.52% liblzma.so.5.2.4 [.] 0x00000000000158cd
0.52% liblzma.so.5.2.4 [.] 0x000000000001613d
0.52% libpthread-2.28.so [.] __pthread_mutex_init
Perf Record
perf-record run a command and record its profile into perf.data inside the current working directory.
For example, run a command “df -h” and record its profile into perf.data using the below command:
perf record df -h
[root@localhost ~]# perf record df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 395M 0 395M 0% /dev
tmpfs 411M 0 411M 0% /dev/shm
tmpfs 411M 6.0M 405M 2% /run
tmpfs 411M 0 411M 0% /sys/fs/cgroup
/dev/mapper/cl-root 27G 4.4G 23G 17% /
/dev/sda1 976M 191M 719M 21% /boot
tmpfs 83M 4.0K 83M 1% /run/user/0
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.013 MB perf.data (1 samples) ]
Perf Report
This command will analyze file generated by perf record and generate flat or graph profile.
perf report
Samples: 1 of event 'cpu-clock', Event count (approx.): 250000
Overhead Command Shared Object Symbol
100.00% df [kernel.kallsyms] [k] finish_task_switch