Installing and Using perf on CentOS 8

0
2874

perf began as a tool for using the performance counters subsystem in Linux, and has had various enhancements to add tracing capabilities.

Performance counters are CPU hardware registers that count hardware events such as instructions executed, cache-misses suffered, or branches mispredicted. They form a basis for profiling applications to trace dynamic control flow and identify hotspots. perf provides rich generalized abstractions over hardware specific capabilities. Among others, it provides per task, per CPU and per-workload counters, sampling on top of these and source code event annotation.

Tracepoints are instrumentation points placed at logical locations in code, such as for system calls, TCP/IP events, file system operations, etc. These have negligible overhead when not in use, and can be enabled by the perf command to collect information including timestamps and stack traces. perf can also dynamically create tracepoints using the kprobes and uprobes frameworks, for kernel and userspace dynamic tracing. The possibilities with these are endless.

The userspace perf command present a simple to use interface with commands like:

  • perf stat: obtain event counts
  • perf record: record events for later reporting
  • perf report: break down events by process, function, etc.
  • perf annotate: annotate assembly or source code with event counts
  • perf top: see live event count
  • perf bench: run different kernel microbenchmarks

Install Perf

dnf install perf -y
[root@localhost ~]# perf --help

 usage: perf [--version] [--help] [OPTIONS] COMMAND [ARGS]

 The most commonly used perf commands are:
   annotate        Read perf.data (created by perf record) and display annotated code
   archive         Create archive with object files with build-ids found in perf.data file
   bench           General framework for benchmark suites
   buildid-cache   Manage build-id cache.
   buildid-list    List the buildids in a perf.data file
   c2c             Shared Data C2C/HITM Analyzer.
   config          Get and set variables in a configuration file.
   data            Data file related processing
   diff            Read perf.data files and display the differential profile
   evlist          List the event names in a perf.data file
   ftrace          simple wrapper for kernel's ftrace functionality
   inject          Filter to augment the events stream with additional information
   kallsyms        Searches running kernel for symbols
   kmem            Tool to trace/measure kernel memory properties
   kvm             Tool to trace/measure kvm guest os
   list            List all symbolic event types
   lock            Analyze lock events
   mem             Profile memory accesses
   record          Run a command and record its profile into perf.data
   report          Read perf.data (created by perf record) and display the profile
   sched           Tool to trace/measure scheduler properties (latencies)
   script          Read perf.data (created by perf record) and display trace output
   stat            Run a command and gather performance counter statistics
   test            Runs sanity tests.
   timechart       Tool to visualize total system behavior during a workload
   top             System profiling tool.
   version         display the version of perf binary
   probe           Define new dynamic tracepoints
   trace           strace inspired tool

 See 'perf help COMMAND' for more information on a specific command.

Perf Stat

perf list stat
[root@localhost ~]# perf list stat

List of pre-defined events (to be used in -e):

  jbd2:jbd2_checkpoint_stats                         [Tracepoint event]
  jbd2:jbd2_handle_stats                             [Tracepoint event]
  jbd2:jbd2_run_stats                                [Tracepoint event]
  power:pstate_sample                                [Tracepoint event]
  rtc:rtc_irq_set_state                              [Tracepoint event]
  sched:sched_stat_blocked                           [Tracepoint event]
  sched:sched_stat_iowait                            [Tracepoint event]
  sched:sched_stat_runtime                           [Tracepoint event]
  sched:sched_stat_sleep                             [Tracepoint event]
  sched:sched_stat_wait                              [Tracepoint event]
  sock:inet_sock_set_state                           [Tracepoint event]
  sunrpc:rpc_bind_status                             [Tracepoint event]
  sunrpc:rpc_call_status                             [Tracepoint event]
  sunrpc:rpc_connect_status                          [Tracepoint event]
  sunrpc:rpc_socket_state_change                     [Tracepoint event]
  sunrpc:rpc_stats_latency                           [Tracepoint event]
  sunrpc:svc_stats_latency                           [Tracepoint event]
  syscalls:sys_enter_fstatfs                         [Tracepoint event]
  syscalls:sys_enter_newfstat                        [Tracepoint event]
  syscalls:sys_enter_newfstatat                      [Tracepoint event]
  syscalls:sys_enter_newlstat                        [Tracepoint event]
  syscalls:sys_enter_newstat                         [Tracepoint event]
  syscalls:sys_enter_statfs                          [Tracepoint event]
  syscalls:sys_enter_statx                           [Tracepoint event]
  syscalls:sys_enter_ustat                           [Tracepoint event]
  syscalls:sys_exit_fstatfs                          [Tracepoint event]
  syscalls:sys_exit_newfstat                         [Tracepoint event]
  syscalls:sys_exit_newfstatat                       [Tracepoint event]
  syscalls:sys_exit_newlstat                         [Tracepoint event]
  syscalls:sys_exit_newstat                          [Tracepoint event]
  syscalls:sys_exit_statfs                           [Tracepoint event]
  syscalls:sys_exit_statx                            [Tracepoint event]
  syscalls:sys_exit_ustat                            [Tracepoint event]
  timer:itimer_state                                 [Tracepoint event]
  wbt:wbt_stat                                       [Tracepoint event]
  writeback:global_dirty_state                       [Tracepoint event]
  x86_fpu:x86_fpu_activate_state                     [Tracepoint event]
  x86_fpu:x86_fpu_init_state                         [Tracepoint event]
  x86_fpu:x86_fpu_xstate_check_failed                [Tracepoint event]
  xfs:xfs_attr_fillstate                             [Tracepoint event]
  xfs:xfs_attr_refillstate                           [Tracepoint event]
  xfs:xfs_rmap_convert_state                         [Tracepoint event]
  xhci-hcd:xhci_get_port_status                      [Tracepoint event]
  xhci-hcd:xhci_handle_port_status                   [Tracepoint event]
  xhci-hcd:xhci_hub_status_data                      [Tracepoint event]


Metric Groups:

DSB:
  DSB_Coverage
       [Fraction of Uops delivered by the DSB (aka Decoded Icache; or Uop Cache)]
FLOPS:
  GFLOPs
       [Giga Floating Point Operations Per Second]
Frontend:

If you want to know what happens on system when we run free -m command, use the following command:

perf stat free -m
[root@localhost ~]# perf stat free -m
              total        used        free      shared  buff/cache   available
Mem:            821         362          93           1         365         323
Swap:          2219          11        2208

 Performance counter stats for 'free -m':

              1.82 msec task-clock                #    0.544 CPUs utilized
                 1      context-switches          #    0.550 K/sec
                 0      cpu-migrations            #    0.000 K/sec
               187      page-faults               #    0.103 M/sec
   <not supported>      cycles
   <not supported>      instructions
   <not supported>      branches
   <not supported>      branch-misses

       0.003339558 seconds time elapsed

       0.000981000 seconds user
       0.001039000 seconds sys

Perf Top

perf-top is a system profiling tool used to generates and displays a performance counter profile in real-time.

Samples: 186  of event 'cpu-clock', 4000 Hz, Event count (approx.): 46500000 lost: 0/0 drop: 0/0
Overhead  Shared Object       Symbol
   9.14%  [kernel]            [k] memcpy
   6.99%  [kernel]            [k] module_get_kallsym
   6.99%  perf                [.] rb_next
   6.45%  [kernel]            [k] number
   5.38%  libc-2.28.so        [.] __strcmp_avx2
   5.38%  perf                [.] __symbols__insert
   4.84%  libc-2.28.so        [.] __GI_____strtoull_l_internal
   4.30%  [kernel]            [k] kallsyms_expand_symbol.constprop.1
   4.30%  [kernel]            [k] vsnprintf
   3.23%  libc-2.28.so        [.] _IO_feof
   2.15%  [kernel]            [k] format_decode
   2.15%  [kernel]            [k] s_next
   2.15%  libc-2.28.so        [.] _IO_getdelim
   2.15%  perf                [.] kallsyms__parse
   1.61%  [kernel]            [k] __do_page_fault
   1.61%  [kernel]            [k] pointer_string
   1.61%  [kernel]            [k] s_show
   1.61%  [kernel]            [k] string
   1.61%  libc-2.28.so        [.] __mprotect
   1.61%  libc-2.28.so        [.] __strchr_avx2
   1.08%  [kernel]            [k] __handle_mm_fault
   1.08%  [kernel]            [k] update_iter
   1.08%  libpthread-2.28.so  [.] __pthread_mutex_init
   1.08%  perf                [.] __dso__load_kallsyms
   1.08%  perf                [.] rb_erase
   1.08%  perf                [.] 0x0000000000299b01
   0.54%  [kernel]            [k] __audit_syscall_entry
   0.54%  [kernel]            [k] __audit_syscall_exit
   0.54%  [kernel]            [k] __fsnotify_parent
   0.54%  [kernel]            [k] __x86_indirect_thunk_rax
   0.54%  [kernel]            [k] do_anonymous_page
   0.54%  [kernel]            [k] do_idle
   0.54%  [kernel]            [k] down_write_killable
   0.54%  [kernel]            [k] e1000_watchdog
   0.54%  [kernel]            [k] e1000_xmit_frame
   0.54%  [kernel]            [k] mutex_unlock
   0.54%  [kernel]            [k] perf_event_mmap
   0.54%  [kernel]            [k] pointer
   0.54%  [kernel]            [k] proc_reg_read
   0.54%  [kernel]            [k] seq_printf
   0.54%  [kernel]            [k] seq_read
   0.54%  [kernel]            [k] strlcpy
   0.54%  libc-2.28.so        [.] __GI_strtoul
   0.54%  libc-2.28.so        [.] __ctype_tolower_loc
   0.54%  libc-2.28.so        [.] __memchr_avx2
   0.54%  libc-2.28.so        [.] __memmove_avx_unaligned
   0.54%  libc-2.28.so        [.] __strspn_sse42
   0.54%  libc-2.28.so        [.] read
   0.54%  libc-2.28.so        [.] sysmalloc
   0.54%  perf                [.] eprintf
   0.54%  perf                [.] kallsyms__is_function
   0.54%  perf                [.] maps__find
   0.54%  perf                [.] rb_insert_color
   0.54%  perf                [.] 0x000000000029b6d5
   0.54%  perf                [.] 0x000000000029b6e5
   0.54%  perf                [.] 0x000000000029b730
   0.54%  perf                [.] 0x000000000042a890
   0.54%  perf                [.] 0x000000000042b979
   0.54%  perf                [.] 0x000000000042c162
   0.54%  perf                [.] 0x0000000000436125

You can also use -e option to display cpu-clock related events:

perf top -e cpu-clock
Samples: 192  of event 'cpu-clock', 4000 Hz, Event count (approx.): 48000000 lost: 0/0 drop: 0/0
Overhead  Shared Object       Symbol
   9.38%  [kernel]            [k] memcpy
   6.25%  [kernel]            [k] vsnprintf
   5.73%  perf                [.] rb_next
   5.21%  [kernel]            [k] kallsyms_expand_symbol.constprop.1
   5.21%  [kernel]            [k] module_get_kallsym
   5.21%  libc-2.28.so        [.] __strcmp_avx2
   4.69%  perf                [.] __symbols__insert
   3.65%  [kernel]            [k] format_decode
   3.65%  [kernel]            [k] number
   3.65%  libc-2.28.so        [.] _IO_feof
   2.60%  [kernel]            [k] string
   2.60%  libc-2.28.so        [.] __GI_____strtoull_l_internal
   2.08%  libc-2.28.so        [.] __strchr_avx2
   1.56%  [kernel]            [k] update_iter
   1.56%  libc-2.28.so        [.] _IO_getdelim
   1.56%  libc-2.28.so        [.] __memmove_avx_unaligned
   1.56%  perf                [.] hex2u64
   1.04%  [kernel]            [k] clear_page_rep
   1.04%  [kernel]            [k] copy_user_generic_string
   1.04%  [kernel]            [k] s_next
   1.04%  [kernel]            [k] seq_read
   1.04%  libc-2.28.so        [.] __memchr_avx2
   1.04%  libc-2.28.so        [.] __mprotect
   1.04%  libc-2.28.so        [.] cfree@GLIBC_2.2.5
   1.04%  perf                [.] rb_erase
   1.04%  perf                [.] symbol__new
   1.04%  perf                [.] 0x00000000002715d5
   1.04%  perf                [.] 0x0000000000299b01
   0.52%  [e1000]             [k] e1000_xmit_frame
   0.52%  [kernel]            [k] __audit_syscall_exit
   0.52%  [kernel]            [k] __virt_addr_valid
   0.52%  [kernel]            [k] __x86_indirect_thunk_rax
   0.52%  [kernel]            [k] alloc_pages_vma
   0.52%  [kernel]            [k] cmp_name
   0.52%  [kernel]            [k] e1000_watchdog
   0.52%  [kernel]            [k] pointer
   0.52%  [kernel]            [k] pointer_string
   0.52%  [kernel]            [k] proc_reg_read
   0.52%  [kernel]            [k] s_show
   0.52%  [kernel]            [k] seq_printf
   0.52%  [kernel]            [k] strlen
   0.52%  [kernel]            [k] up_write
   0.52%  [kernel]            [k] widen_string
   0.52%  libc-2.28.so        [.] __ctype_toupper_loc
   0.52%  libc-2.28.so        [.] __libc_calloc
   0.52%  libc-2.28.so        [.] __memcmp_avx2_movbe
   0.52%  libc-2.28.so        [.] __memmove_avx_unaligned_erms
   0.52%  libc-2.28.so        [.] __memset_avx2_unaligned
   0.52%  libc-2.28.so        [.] __memset_avx2_unaligned_erms
   0.52%  libc-2.28.so        [.] __strrchr_avx2
   0.52%  libc-2.28.so        [.] _int_malloc
   0.52%  libc-2.28.so        [.] malloc
   0.52%  libc-2.28.so        [.] sysmalloc
   0.52%  liblzma.so.5.2.4    [.] 0x0000000000014b9b
   0.52%  liblzma.so.5.2.4    [.] 0x0000000000014bc2
   0.52%  liblzma.so.5.2.4    [.] 0x0000000000014dff
   0.52%  liblzma.so.5.2.4    [.] 0x000000000001523a
   0.52%  liblzma.so.5.2.4    [.] 0x00000000000158cd
   0.52%  liblzma.so.5.2.4    [.] 0x000000000001613d
   0.52%  libpthread-2.28.so  [.] __pthread_mutex_init

Perf Record

perf-record run a command and record its profile into perf.data inside the current working directory.

For example, run a command “df -h” and record its profile into perf.data using the below command:

perf record df -h
[root@localhost ~]# perf record df -h
Filesystem           Size  Used Avail Use% Mounted on
devtmpfs             395M     0  395M   0% /dev
tmpfs                411M     0  411M   0% /dev/shm
tmpfs                411M  6.0M  405M   2% /run
tmpfs                411M     0  411M   0% /sys/fs/cgroup
/dev/mapper/cl-root   27G  4.4G   23G  17% /
/dev/sda1            976M  191M  719M  21% /boot
tmpfs                 83M  4.0K   83M   1% /run/user/0
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.013 MB perf.data (1 samples) ]

Perf Report

This command will analyze file generated by perf record and generate flat or graph profile.

perf report
Samples: 1  of event 'cpu-clock', Event count (approx.): 250000
Overhead  Command  Shared Object      Symbol
 100.00%  df       [kernel.kallsyms]  [k] finish_task_switch