examples.txt 8.1 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226
  1. ------------------------------
  2. ****** perf by examples ******
  3. ------------------------------
  4. [ From an e-mail by Ingo Molnar, http://lkml.org/lkml/2009/8/4/346 ]
  5. First, discovery/enumeration of available counters can be done via
  6. 'perf list':
  7. titan:~> perf list
  8. [...]
  9. kmem:kmalloc [Tracepoint event]
  10. kmem:kmem_cache_alloc [Tracepoint event]
  11. kmem:kmalloc_node [Tracepoint event]
  12. kmem:kmem_cache_alloc_node [Tracepoint event]
  13. kmem:kfree [Tracepoint event]
  14. kmem:kmem_cache_free [Tracepoint event]
  15. kmem:mm_page_free [Tracepoint event]
  16. kmem:mm_page_free_batched [Tracepoint event]
  17. kmem:mm_page_alloc [Tracepoint event]
  18. kmem:mm_page_alloc_zone_locked [Tracepoint event]
  19. kmem:mm_page_pcpu_drain [Tracepoint event]
  20. kmem:mm_page_alloc_extfrag [Tracepoint event]
  21. Then any (or all) of the above event sources can be activated and
  22. measured. For example the page alloc/free properties of a 'hackbench
  23. run' are:
  24. titan:~> perf stat -e kmem:mm_page_pcpu_drain -e kmem:mm_page_alloc
  25. -e kmem:mm_page_free_batched -e kmem:mm_page_free ./hackbench 10
  26. Time: 0.575
  27. Performance counter stats for './hackbench 10':
  28. 13857 kmem:mm_page_pcpu_drain
  29. 27576 kmem:mm_page_alloc
  30. 6025 kmem:mm_page_free_batched
  31. 20934 kmem:mm_page_free
  32. 0.613972165 seconds time elapsed
  33. You can observe the statistical properties as well, by using the
  34. 'repeat the workload N times' feature of perf stat:
  35. titan:~> perf stat --repeat 5 -e kmem:mm_page_pcpu_drain -e
  36. kmem:mm_page_alloc -e kmem:mm_page_free_batched -e
  37. kmem:mm_page_free ./hackbench 10
  38. Time: 0.627
  39. Time: 0.644
  40. Time: 0.564
  41. Time: 0.559
  42. Time: 0.626
  43. Performance counter stats for './hackbench 10' (5 runs):
  44. 12920 kmem:mm_page_pcpu_drain ( +- 3.359% )
  45. 25035 kmem:mm_page_alloc ( +- 3.783% )
  46. 6104 kmem:mm_page_free_batched ( +- 0.934% )
  47. 18376 kmem:mm_page_free ( +- 4.941% )
  48. 0.643954516 seconds time elapsed ( +- 2.363% )
  49. Furthermore, these tracepoints can be used to sample the workload as
  50. well. For example the page allocations done by a 'git gc' can be
  51. captured the following way:
  52. titan:~/git> perf record -f -e kmem:mm_page_alloc -c 1 ./git gc
  53. Counting objects: 1148, done.
  54. Delta compression using up to 2 threads.
  55. Compressing objects: 100% (450/450), done.
  56. Writing objects: 100% (1148/1148), done.
  57. Total 1148 (delta 690), reused 1148 (delta 690)
  58. [ perf record: Captured and wrote 0.267 MB perf.data (~11679 samples) ]
  59. To check which functions generated page allocations:
  60. titan:~/git> perf report
  61. # Samples: 10646
  62. #
  63. # Overhead Command Shared Object
  64. # ........ ............... ..........................
  65. #
  66. 23.57% git-repack /lib64/libc-2.5.so
  67. 21.81% git /lib64/libc-2.5.so
  68. 14.59% git ./git
  69. 11.79% git-repack ./git
  70. 7.12% git /lib64/ld-2.5.so
  71. 3.16% git-repack /lib64/libpthread-2.5.so
  72. 2.09% git-repack /bin/bash
  73. 1.97% rm /lib64/libc-2.5.so
  74. 1.39% mv /lib64/ld-2.5.so
  75. 1.37% mv /lib64/libc-2.5.so
  76. 1.12% git-repack /lib64/ld-2.5.so
  77. 0.95% rm /lib64/ld-2.5.so
  78. 0.90% git-update-serv /lib64/libc-2.5.so
  79. 0.73% git-update-serv /lib64/ld-2.5.so
  80. 0.68% perf /lib64/libpthread-2.5.so
  81. 0.64% git-repack /usr/lib64/libz.so.1.2.3
  82. Or to see it on a more finegrained level:
  83. titan:~/git> perf report --sort comm,dso,symbol
  84. # Samples: 10646
  85. #
  86. # Overhead Command Shared Object Symbol
  87. # ........ ............... .......................... ......
  88. #
  89. 9.35% git-repack ./git [.] insert_obj_hash
  90. 9.12% git ./git [.] insert_obj_hash
  91. 7.31% git /lib64/libc-2.5.so [.] memcpy
  92. 6.34% git-repack /lib64/libc-2.5.so [.] _int_malloc
  93. 6.24% git-repack /lib64/libc-2.5.so [.] memcpy
  94. 5.82% git-repack /lib64/libc-2.5.so [.] __GI___fork
  95. 5.47% git /lib64/libc-2.5.so [.] _int_malloc
  96. 2.99% git /lib64/libc-2.5.so [.] memset
  97. Furthermore, call-graph sampling can be done too, of page
  98. allocations - to see precisely what kind of page allocations there
  99. are:
  100. titan:~/git> perf record -f -g -e kmem:mm_page_alloc -c 1 ./git gc
  101. Counting objects: 1148, done.
  102. Delta compression using up to 2 threads.
  103. Compressing objects: 100% (450/450), done.
  104. Writing objects: 100% (1148/1148), done.
  105. Total 1148 (delta 690), reused 1148 (delta 690)
  106. [ perf record: Captured and wrote 0.963 MB perf.data (~42069 samples) ]
  107. titan:~/git> perf report -g
  108. # Samples: 10686
  109. #
  110. # Overhead Command Shared Object
  111. # ........ ............... ..........................
  112. #
  113. 23.25% git-repack /lib64/libc-2.5.so
  114. |
  115. |--50.00%-- _int_free
  116. |
  117. |--37.50%-- __GI___fork
  118. | make_child
  119. |
  120. |--12.50%-- ptmalloc_unlock_all2
  121. | make_child
  122. |
  123. --6.25%-- __GI_strcpy
  124. 21.61% git /lib64/libc-2.5.so
  125. |
  126. |--30.00%-- __GI_read
  127. | |
  128. | --83.33%-- git_config_from_file
  129. | git_config
  130. | |
  131. [...]
  132. Or you can observe the whole system's page allocations for 10
  133. seconds:
  134. titan:~/git> perf stat -a -e kmem:mm_page_pcpu_drain -e
  135. kmem:mm_page_alloc -e kmem:mm_page_free_batched -e
  136. kmem:mm_page_free sleep 10
  137. Performance counter stats for 'sleep 10':
  138. 171585 kmem:mm_page_pcpu_drain
  139. 322114 kmem:mm_page_alloc
  140. 73623 kmem:mm_page_free_batched
  141. 254115 kmem:mm_page_free
  142. 10.000591410 seconds time elapsed
  143. Or observe how fluctuating the page allocations are, via statistical
  144. analysis done over ten 1-second intervals:
  145. titan:~/git> perf stat --repeat 10 -a -e kmem:mm_page_pcpu_drain -e
  146. kmem:mm_page_alloc -e kmem:mm_page_free_batched -e
  147. kmem:mm_page_free sleep 1
  148. Performance counter stats for 'sleep 1' (10 runs):
  149. 17254 kmem:mm_page_pcpu_drain ( +- 3.709% )
  150. 34394 kmem:mm_page_alloc ( +- 4.617% )
  151. 7509 kmem:mm_page_free_batched ( +- 4.820% )
  152. 25653 kmem:mm_page_free ( +- 3.672% )
  153. 1.058135029 seconds time elapsed ( +- 3.089% )
  154. Or you can annotate the recorded 'git gc' run on a per symbol basis
  155. and check which instructions/source-code generated page allocations:
  156. titan:~/git> perf annotate __GI___fork
  157. ------------------------------------------------
  158. Percent | Source code & Disassembly of libc-2.5.so
  159. ------------------------------------------------
  160. :
  161. :
  162. : Disassembly of section .plt:
  163. : Disassembly of section .text:
  164. :
  165. : 00000031a2e95560 <__fork>:
  166. [...]
  167. 0.00 : 31a2e95602: b8 38 00 00 00 mov $0x38,%eax
  168. 0.00 : 31a2e95607: 0f 05 syscall
  169. 83.42 : 31a2e95609: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax
  170. 0.00 : 31a2e9560f: 0f 87 4d 01 00 00 ja 31a2e95762 <__fork+0x202>
  171. 0.00 : 31a2e95615: 85 c0 test %eax,%eax
  172. ( this shows that 83.42% of __GI___fork's page allocations come from
  173. the 0x38 system call it performs. )
  174. etc. etc. - a lot more is possible. I could list a dozen of
  175. other different usecases straight away - neither of which is
  176. possible via /proc/vmstat.
  177. /proc/vmstat is not in the same league really, in terms of
  178. expressive power of system analysis and performance
  179. analysis.
  180. All that the above results needed were those new tracepoints
  181. in include/tracing/events/kmem.h.
  182. Ingo