general_optimization.rst 12 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304
  1. .. _doc_general_optimization:
  2. General optimization tips
  3. =========================
  4. Introduction
  5. ------------
  6. In an ideal world, computers would run at infinite speed. The only limit to
  7. what we could achieve would be our imagination. However, in the real world, it's
  8. all too easy to produce software that will bring even the fastest computer to
  9. its knees.
  10. Thus, designing games and other software is a compromise between what we would
  11. like to be possible, and what we can realistically achieve while maintaining
  12. good performance.
  13. To achieve the best results, we have two approaches:
  14. - Work faster.
  15. - Work smarter.
  16. And preferably, we will use a blend of the two.
  17. Smoke and mirrors
  18. ^^^^^^^^^^^^^^^^^
  19. Part of working smarter is recognizing that, in games, we can often get the
  20. player to believe they're in a world that is far more complex, interactive, and
  21. graphically exciting than it really is. A good programmer is a magician, and
  22. should strive to learn the tricks of the trade while trying to invent new ones.
  23. The nature of slowness
  24. ^^^^^^^^^^^^^^^^^^^^^^
  25. To the outside observer, performance problems are often lumped together.
  26. But in reality, there are several different kinds of performance problems:
  27. - A slow process that occurs every frame, leading to a continuously low frame
  28. rate.
  29. - An intermittent process that causes "spikes" of slowness, leading to
  30. stalls.
  31. - A slow process that occurs outside of normal gameplay, for instance,
  32. when loading a level.
  33. Each of these are annoying to the user, but in different ways.
  34. Measuring performance
  35. ---------------------
  36. Probably the most important tool for optimization is the ability to measure
  37. performance - to identify where bottlenecks are, and to measure the success of
  38. our attempts to speed them up.
  39. There are several methods of measuring performance, including:
  40. - Putting a start/stop timer around code of interest.
  41. - Using the :ref:`Godot profiler <doc_the_profiler>`.
  42. - Using :ref:`external CPU profilers <doc_using_cpp_profilers>`.
  43. - Using external GPU profilers/debuggers such as
  44. `NVIDIA Nsight Graphics <https://developer.nvidia.com/nsight-graphics>`__,
  45. `Radeon GPU Profiler <https://gpuopen.com/rgp/>`__,
  46. `Intel Graphics Performance Analyzers <https://www.intel.com/content/www/us/en/developer/tools/graphics-performance-analyzers/overview.html>`__, or
  47. `Arm Performance Studio <https://developer.arm.com/Tools%20and%20Software/Arm%20Performance%20Studio>`__.
  48. - Checking the frame rate (with V-Sync disabled). Third-party utilities such as
  49. `RivaTuner Statistics Server <https://www.guru3d.com/files-details/rtss-rivatuner-statistics-server-download.html>`__
  50. (Windows) or `MangoHud <https://github.com/flightlessmango/MangoHud>`__
  51. (Linux) can also be useful here.
  52. - Using an unofficial `debug menu add-on <https://github.com/godot-extended-libraries/godot-debug-menu>`__.
  53. Be very aware that the relative performance of different areas can vary on
  54. different hardware. It's often a good idea to measure timings on more than one
  55. device. This is especially the case if you're targeting mobile devices.
  56. Limitations
  57. ^^^^^^^^^^^
  58. CPU profilers are often the go-to method for measuring performance. However,
  59. they don't always tell the whole story.
  60. - Bottlenecks are often on the GPU, "as a result" of instructions given by the
  61. CPU.
  62. - Spikes can occur in the operating system processes (outside of Godot) "as a
  63. result" of instructions used in Godot (for example, dynamic memory allocation).
  64. - You may not always be able to profile specific devices like a mobile phone
  65. due to the initial setup required.
  66. - You may have to solve performance problems that occur on hardware you don't
  67. have access to.
  68. As a result of these limitations, you often need to use detective work to find
  69. out where bottlenecks are.
  70. Detective work
  71. --------------
  72. Detective work is a crucial skill for developers (both in terms of performance,
  73. and also in terms of bug fixing). This can include hypothesis testing, and
  74. binary search.
  75. Hypothesis testing
  76. ^^^^^^^^^^^^^^^^^^
  77. Say, for example, that you believe sprites are slowing down your game.
  78. You can test this hypothesis by:
  79. - Measuring the performance when you add more sprites, or take some away.
  80. This may lead to a further hypothesis: does the size of the sprite determine
  81. the performance drop?
  82. - You can test this by keeping everything the same, but changing the sprite
  83. size, and measuring performance.
  84. Binary search
  85. ^^^^^^^^^^^^^
  86. If you know that frames are taking much longer than they should, but you're
  87. not sure where the bottleneck lies. You could begin by commenting out
  88. approximately half the routines that occur on a normal frame. Has the
  89. performance improved more or less than expected?
  90. Once you know which of the two halves contains the bottleneck, you can
  91. repeat this process until you've pinned down the problematic area.
  92. Profilers
  93. ---------
  94. Profilers allow you to time your program while running it. Profilers then
  95. provide results telling you what percentage of time was spent in different
  96. functions and areas, and how often functions were called.
  97. This can be very useful both to identify bottlenecks and to measure the results
  98. of your improvements. Sometimes, attempts to improve performance can backfire
  99. and lead to slower performance.
  100. **Always use profiling and timing to guide your efforts.**
  101. For more info about using Godot's built-in profiler, see :ref:`doc_the_profiler`.
  102. Principles
  103. ----------
  104. `Donald Knuth <https://en.wikipedia.org/wiki/Donald_Knuth>`__ said:
  105. *Programmers waste enormous amounts of time thinking about, or worrying
  106. about, the speed of noncritical parts of their programs, and these attempts
  107. at efficiency actually have a strong negative impact when debugging and
  108. maintenance are considered. We should forget about small efficiencies, say
  109. about 97% of the time: premature optimization is the root of all evil. Yet
  110. we should not pass up our opportunities in that critical 3%.*
  111. The messages are very important:
  112. - Developer time is limited. Instead of blindly trying to speed up
  113. all aspects of a program, we should concentrate our efforts on the aspects
  114. that really matter.
  115. - Efforts at optimization often end up with code that is harder to read and
  116. debug than non-optimized code. It is in our interests to limit this to areas
  117. that will really benefit.
  118. Just because we *can* optimize a particular bit of code, it doesn't necessarily
  119. mean that we *should*. Knowing when and when not to optimize is a great skill to
  120. develop.
  121. One misleading aspect of the quote is that people tend to focus on the subquote
  122. *"premature optimization is the root of all evil"*. While *premature*
  123. optimization is (by definition) undesirable, performant software is the result
  124. of performant design.
  125. Performant design
  126. ^^^^^^^^^^^^^^^^^
  127. The danger with encouraging people to ignore optimization until necessary, is
  128. that it conveniently ignores that the most important time to consider
  129. performance is at the design stage, before a key has even hit a keyboard. If the
  130. design or algorithms of a program are inefficient, then no amount of polishing
  131. the details later will make it run fast. It may run *faster*, but it will never
  132. run as fast as a program designed for performance.
  133. This tends to be far more important in game or graphics programming than in
  134. general programming. A performant design, even without low-level optimization,
  135. will often run many times faster than a mediocre design with low-level
  136. optimization.
  137. Incremental design
  138. ^^^^^^^^^^^^^^^^^^
  139. Of course, in practice, unless you have prior knowledge, you are unlikely to
  140. come up with the best design the first time. Instead, you'll often make a series
  141. of versions of a particular area of code, each taking a different approach to
  142. the problem, until you come to a satisfactory solution. It's important not to
  143. spend too much time on the details at this stage until you have finalized the
  144. overall design. Otherwise, much of your work will be thrown out.
  145. It's difficult to give general guidelines for performant design because this is
  146. so dependent on the problem. One point worth mentioning though, on the CPU side,
  147. is that modern CPUs are nearly always limited by memory bandwidth. This has led
  148. to a resurgence in data-oriented design, which involves designing data
  149. structures and algorithms for *cache locality* of data and linear access, rather
  150. than jumping around in memory.
  151. The optimization process
  152. ^^^^^^^^^^^^^^^^^^^^^^^^
  153. Assuming we have a reasonable design, and taking our lessons from Knuth, our
  154. first step in optimization should be to identify the biggest bottlenecks - the
  155. slowest functions, the low-hanging fruit.
  156. Once we've successfully improved the speed of the slowest area, it may no
  157. longer be the bottleneck. So we should test/profile again and find the next
  158. bottleneck on which to focus.
  159. The process is thus:
  160. 1. Profile / Identify bottleneck.
  161. 2. Optimize bottleneck.
  162. 3. Return to step 1.
  163. Optimizing bottlenecks
  164. ^^^^^^^^^^^^^^^^^^^^^^
  165. Some profilers will even tell you which part of a function (which data accesses,
  166. calculations) are slowing things down.
  167. As with design, you should concentrate your efforts first on making sure the
  168. algorithms and data structures are the best they can be. Data access should be
  169. local (to make best use of CPU cache), and it can often be better to use compact
  170. storage of data (again, always profile to test results). Often, you precalculate
  171. heavy computations ahead of time. This can be done by performing the computation
  172. when loading a level, by loading a file containing precalculated data, or
  173. by storing the results of complex calculations into a script constant and
  174. reading its value.
  175. Once algorithms and data are good, you can often make small changes in routines
  176. which improve performance. For instance, you can move some calculations outside
  177. of loops or transform nested ``for`` loops into non-nested loops.
  178. (This should be feasible if you know a 2D array's width or height in advance.)
  179. Always retest your timing/bottlenecks after making each change. Some changes
  180. will increase speed, others may have a negative effect. Sometimes, a small
  181. positive effect will be outweighed by the negatives of more complex code, and
  182. you may choose to leave out that optimization.
  183. Appendix
  184. --------
  185. Bottleneck math
  186. ^^^^^^^^^^^^^^^
  187. The proverb *"a chain is only as strong as its weakest link"* applies directly to
  188. performance optimization. If your project is spending 90% of the time in
  189. function ``A``, then optimizing ``A`` can have a massive effect on performance.
  190. .. code-block:: none
  191. A: 9 ms
  192. Everything else: 1 ms
  193. Total frame time: 10 ms
  194. .. code-block:: none
  195. A: 1 ms
  196. Everything else: 1ms
  197. Total frame time: 2 ms
  198. In this example, improving this bottleneck ``A`` by a factor of 9× decreases
  199. overall frame time by 5× while increasing frames per second by 5×.
  200. However, if something else is running slowly and also bottlenecking your
  201. project, then the same improvement can lead to less dramatic gains:
  202. .. code-block:: none
  203. A: 9 ms
  204. Everything else: 50 ms
  205. Total frame time: 59 ms
  206. .. code-block:: none
  207. A: 1 ms
  208. Everything else: 50 ms
  209. Total frame time: 51 ms
  210. In this example, even though we have hugely optimized function ``A``,
  211. the actual gain in terms of frame rate is quite small.
  212. In games, things become even more complicated because the CPU and GPU run
  213. independently of one another. Your total frame time is determined by the slower
  214. of the two.
  215. .. code-block:: none
  216. CPU: 9 ms
  217. GPU: 50 ms
  218. Total frame time: 50 ms
  219. .. code-block:: none
  220. CPU: 1 ms
  221. GPU: 50 ms
  222. Total frame time: 50 ms
  223. In this example, we optimized the CPU hugely again, but the frame time didn't
  224. improve because we are GPU-bottlenecked.