performance.rst 6.6 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151
  1. Performance
  2. ===================
  3. The main goals for |kitty| performance are user perceived latency while typing
  4. and "smoothness" while scrolling as well as CPU usage. |kitty| tries hard to
  5. find an optimum balance for these. To that end it keeps a cache of each
  6. rendered glyph in video RAM so that font rendering is not a bottleneck.
  7. Interaction with child programs takes place in a separate thread from
  8. rendering, to improve smoothness. Parsing of the byte stream is done using
  9. `vector CPU instructions
  10. <https://en.wikipedia.org/wiki/Single_instruction,_multiple_data>`__ for
  11. maximum performance. Updates to the screen typically require sending just a few
  12. bytes to the GPU.
  13. There are two config options you can tune to adjust the performance,
  14. :opt:`repaint_delay` and :opt:`input_delay`. These control the artificial delays
  15. introduced into the render loop to reduce CPU usage. See
  16. :ref:`conf-kitty-performance` for details. See also the :opt:`sync_to_monitor`
  17. option to further decrease latency at the cost of some `screen tearing
  18. <https://en.wikipedia.org/wiki/Screen_tearing>`__ while scrolling.
  19. Benchmarks
  20. -------------
  21. Measuring terminal emulator performance is fairly subtle, there are three main
  22. axes on which performance is measured: Energy usage for typical tasks,
  23. Keyboard to screen latency, and throughput (processing large amounts of data).
  24. Keyboard to screen latency
  25. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  26. This is measured either with dedicated hardware, or software such as `Typometer
  27. <https://pavelfatin.com/typometer/>`__. Third party measurements comparing
  28. kitty with other terminal emulators on various systems show kitty has best in
  29. class keyboard to screen latency.
  30. Note that to minimize latency at the expense of more energy usage, use the
  31. following settings in kitty.conf::
  32. input_delay 0
  33. repaint_delay 2
  34. sync_to_monitor no
  35. wayland_enable_ime no
  36. `Hardware based measurement on macOS
  37. <https://thume.ca/2020/05/20/making-a-latency-tester/>`__ show that kitty and
  38. Apple's Terminal.app share the crown for best latency. These
  39. measurements were done with :opt:`input_delay` at its default value of ``3 ms``
  40. which means kitty's actual numbers would be even lower.
  41. `Typometer based measurements on Linux
  42. <https://github.com/kovidgoyal/kitty/issues/2701#issuecomment-911089374>`__
  43. show that kitty has far and away the best latency of the terminals tested.
  44. .. _throughput:
  45. Throughput
  46. ^^^^^^^^^^^^^^^^
  47. kitty has a builtin kitten to measure throughput, it works by dumping large
  48. amounts of data of different types into the tty device and measuring how fast
  49. the terminal parses and responds to it. The measurements below were taken with
  50. the same font, font size and window size for all terminals, and default
  51. settings, on the same computer. They clearly show kitty has the fastest
  52. throughput. To run the tests yourself, run ``kitten __benchmark__`` in the
  53. terminal emulator you want to test, where the kitten binary is part of the
  54. kitty install.
  55. The numbers are megabytes per second of data that the terminal
  56. processes. Measurements were taken under Linux/X11 with an ``AMD Ryzen 7 PRO
  57. 5850U``. Entries are in order of decreasing performance. kitty is twice
  58. as fast as the next best.
  59. ================ ====== ======= ===== ====== =======
  60. Terminal ASCII Unicode CSI Images Average
  61. ================ ====== ======= ===== ====== =======
  62. kitty 0.33 121.8 105.0 59.8 251.6 134.55
  63. gnometerm 3.50.1 33.4 55.0 16.1 142.8 61.83
  64. alacritty 0.13.1 43.1 46.5 32.5 94.1 54.05
  65. wezterm 20230712 16.4 26.0 11.1 140.5 48.5
  66. xterm 389 47.7 18.3 0.6 56.3 30.72
  67. konsole 23.08.04 25.2 37.7 23.6 23.4 27.48
  68. alacritty+tmux 30.3 7.8 14.7 46.1 24.73
  69. ================ ====== ======= ===== ====== =======
  70. In this table, each column represents different types of data. The CSI column
  71. is for data consisting of a mix of typical formatting escape codes and some
  72. ASCII only text.
  73. .. note::
  74. By default, the benchmark kitten suppresses actual rendering, to better
  75. focus on parser speed, you can pass it the ``--render`` flag to not suppress
  76. rendering. However, modern terminals typically render asynchronously,
  77. therefore the numbers are not really useful for comparison, as it is just a
  78. game about how much input to *batch* before rendering the next frame.
  79. However, even with rendering enabled kitty is still faster than all the
  80. rest. For brevity those numbers are not included.
  81. .. note::
  82. foot, iterm2 and Terminal.app are left out as they do not run under X11.
  83. Alacritty+tmux is included just to show the effect of putting a terminal
  84. multiplexer into the mix (halving throughput) and because alacritty isnt
  85. remotely comparable to any of the other terminals feature wise without tmux.
  86. .. note::
  87. konsole, gnome-terminal and xterm do not support the `Synchronized update
  88. <https://gitlab.com/gnachman/iterm2/-/wikis/synchronized-updates-spec>`__
  89. escape code used to suppress rendering, if and when they gain support for it
  90. their numbers are likely to improve by ``20 - 50%``, depending on how well they
  91. implement it.
  92. Energy usage
  93. ^^^^^^^^^^^^^^^^^
  94. Sadly, I do not have the infrastructure to measure actual energy usage so CPU
  95. usage will have to stand in for it. Here are some CPU usage numbers for the
  96. task of scrolling a file continuously in :program:`less`. The CPU usage is for
  97. the terminal process and X together and is measured using :program:`htop`. The
  98. measurements are taken at the same font and window size for all terminals on a
  99. ``Intel(R) Core(TM) i7-4820K CPU @ 3.70GHz`` CPU with a ``Advanced Micro
  100. Devices, Inc. [AMD/ATI] Cape Verde XT [Radeon HD 7770/8760 / R7 250X]`` GPU.
  101. ============== =========================
  102. Terminal CPU usage (X + terminal)
  103. ============== =========================
  104. |kitty| 6 - 8%
  105. xterm 5 - 7% (but scrolling was extremely janky)
  106. termite 10 - 13%
  107. urxvt 12 - 14%
  108. gnome-terminal 15 - 17%
  109. konsole 29 - 31%
  110. ============== =========================
  111. As you can see, |kitty| uses much less CPU than all terminals, except xterm, but
  112. its scrolling "smoothness" is much better than that of xterm (at least to my,
  113. admittedly biased, eyes).
  114. Instrumenting kitty
  115. -----------------------
  116. You can generate detailed per-function performance data using
  117. `gperftools <https://github.com/gperftools/gperftools>`__. Build |kitty| with
  118. ``make profile``. Run kitty and perform the task you want to analyse, for
  119. example, scrolling a large file with :program:`less`. After you quit, function
  120. call statistics will be displayed in *KCachegrind*. Hence, profiling is best done
  121. on Linux which has these tools easily available.