123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303 |
- .. _doc_general_optimization:
- General optimization tips
- =========================
- Introduction
- ------------
- In an ideal world, computers would run at infinite speed. The only limit to
- what we could achieve would be our imagination. However, in the real world, it's
- all too easy to produce software that will bring even the fastest computer to
- its knees.
- Thus, designing games and other software is a compromise between what we would
- like to be possible, and what we can realistically achieve while maintaining
- good performance.
- To achieve the best results, we have two approaches:
- - Work faster.
- - Work smarter.
- And preferably, we will use a blend of the two.
- Smoke and mirrors
- ^^^^^^^^^^^^^^^^^
- Part of working smarter is recognizing that, in games, we can often get the
- player to believe they're in a world that is far more complex, interactive, and
- graphically exciting than it really is. A good programmer is a magician, and
- should strive to learn the tricks of the trade while trying to invent new ones.
- The nature of slowness
- ^^^^^^^^^^^^^^^^^^^^^^
- To the outside observer, performance problems are often lumped together.
- But in reality, there are several different kinds of performance problems:
- - A slow process that occurs every frame, leading to a continuously low frame
- rate.
- - An intermittent process that causes "spikes" of slowness, leading to
- stalls.
- - A slow process that occurs outside of normal gameplay, for instance,
- when loading a level.
- Each of these are annoying to the user, but in different ways.
- Measuring performance
- ---------------------
- Probably the most important tool for optimization is the ability to measure
- performance - to identify where bottlenecks are, and to measure the success of
- our attempts to speed them up.
- There are several methods of measuring performance, including:
- - Putting a start/stop timer around code of interest.
- - Using the :ref:`Godot profiler <doc_the_profiler>`.
- - Using :ref:`external CPU profilers <doc_using_cpp_profilers>`.
- - Using external GPU profilers/debuggers such as
- `NVIDIA Nsight Graphics <https://developer.nvidia.com/nsight-graphics>`__,
- `Radeon GPU Profiler <https://gpuopen.com/rgp/>`__ or
- `Intel Graphics Performance Analyzers <https://www.intel.com/content/www/us/en/developer/tools/graphics-performance-analyzers/overview.html>`__.
- - Checking the frame rate (with V-Sync disabled). Third-party utilities such as
- `RivaTuner Statistics Server <https://www.guru3d.com/files-details/rtss-rivatuner-statistics-server-download.html>`__
- (Windows) or `MangoHud <https://github.com/flightlessmango/MangoHud>`__
- (Linux) can also be useful here.
- - Using an unofficial `debug menu add-on <https://github.com/godot-extended-libraries/godot-debug-menu>`.
- Be very aware that the relative performance of different areas can vary on
- different hardware. It's often a good idea to measure timings on more than one
- device. This is especially the case if you're targeting mobile devices.
- Limitations
- ^^^^^^^^^^^
- CPU profilers are often the go-to method for measuring performance. However,
- they don't always tell the whole story.
- - Bottlenecks are often on the GPU, "as a result" of instructions given by the
- CPU.
- - Spikes can occur in the operating system processes (outside of Godot) "as a
- result" of instructions used in Godot (for example, dynamic memory allocation).
- - You may not always be able to profile specific devices like a mobile phone
- due to the initial setup required.
- - You may have to solve performance problems that occur on hardware you don't
- have access to.
- As a result of these limitations, you often need to use detective work to find
- out where bottlenecks are.
- Detective work
- --------------
- Detective work is a crucial skill for developers (both in terms of performance,
- and also in terms of bug fixing). This can include hypothesis testing, and
- binary search.
- Hypothesis testing
- ^^^^^^^^^^^^^^^^^^
- Say, for example, that you believe sprites are slowing down your game.
- You can test this hypothesis by:
- - Measuring the performance when you add more sprites, or take some away.
- This may lead to a further hypothesis: does the size of the sprite determine
- the performance drop?
- - You can test this by keeping everything the same, but changing the sprite
- size, and measuring performance.
- Binary search
- ^^^^^^^^^^^^^
- If you know that frames are taking much longer than they should, but you're
- not sure where the bottleneck lies. You could begin by commenting out
- approximately half the routines that occur on a normal frame. Has the
- performance improved more or less than expected?
- Once you know which of the two halves contains the bottleneck, you can
- repeat this process until you've pinned down the problematic area.
- Profilers
- ---------
- Profilers allow you to time your program while running it. Profilers then
- provide results telling you what percentage of time was spent in different
- functions and areas, and how often functions were called.
- This can be very useful both to identify bottlenecks and to measure the results
- of your improvements. Sometimes, attempts to improve performance can backfire
- and lead to slower performance.
- **Always use profiling and timing to guide your efforts.**
- For more info about using Godot's built-in profiler, see :ref:`doc_the_profiler`.
- Principles
- ----------
- `Donald Knuth <https://en.wikipedia.org/wiki/Donald_Knuth>`__ said:
- *Programmers waste enormous amounts of time thinking about, or worrying
- about, the speed of noncritical parts of their programs, and these attempts
- at efficiency actually have a strong negative impact when debugging and
- maintenance are considered. We should forget about small efficiencies, say
- about 97% of the time: premature optimization is the root of all evil. Yet
- we should not pass up our opportunities in that critical 3%.*
- The messages are very important:
- - Developer time is limited. Instead of blindly trying to speed up
- all aspects of a program, we should concentrate our efforts on the aspects
- that really matter.
- - Efforts at optimization often end up with code that is harder to read and
- debug than non-optimized code. It is in our interests to limit this to areas
- that will really benefit.
- Just because we *can* optimize a particular bit of code, it doesn't necessarily
- mean that we *should*. Knowing when and when not to optimize is a great skill to
- develop.
- One misleading aspect of the quote is that people tend to focus on the subquote
- *"premature optimization is the root of all evil"*. While *premature*
- optimization is (by definition) undesirable, performant software is the result
- of performant design.
- Performant design
- ^^^^^^^^^^^^^^^^^
- The danger with encouraging people to ignore optimization until necessary, is
- that it conveniently ignores that the most important time to consider
- performance is at the design stage, before a key has even hit a keyboard. If the
- design or algorithms of a program are inefficient, then no amount of polishing
- the details later will make it run fast. It may run *faster*, but it will never
- run as fast as a program designed for performance.
- This tends to be far more important in game or graphics programming than in
- general programming. A performant design, even without low-level optimization,
- will often run many times faster than a mediocre design with low-level
- optimization.
- Incremental design
- ^^^^^^^^^^^^^^^^^^
- Of course, in practice, unless you have prior knowledge, you are unlikely to
- come up with the best design the first time. Instead, you'll often make a series
- of versions of a particular area of code, each taking a different approach to
- the problem, until you come to a satisfactory solution. It's important not to
- spend too much time on the details at this stage until you have finalized the
- overall design. Otherwise, much of your work will be thrown out.
- It's difficult to give general guidelines for performant design because this is
- so dependent on the problem. One point worth mentioning though, on the CPU side,
- is that modern CPUs are nearly always limited by memory bandwidth. This has led
- to a resurgence in data-oriented design, which involves designing data
- structures and algorithms for *cache locality* of data and linear access, rather
- than jumping around in memory.
- The optimization process
- ^^^^^^^^^^^^^^^^^^^^^^^^
- Assuming we have a reasonable design, and taking our lessons from Knuth, our
- first step in optimization should be to identify the biggest bottlenecks - the
- slowest functions, the low-hanging fruit.
- Once we've successfully improved the speed of the slowest area, it may no
- longer be the bottleneck. So we should test/profile again and find the next
- bottleneck on which to focus.
- The process is thus:
- 1. Profile / Identify bottleneck.
- 2. Optimize bottleneck.
- 3. Return to step 1.
- Optimizing bottlenecks
- ^^^^^^^^^^^^^^^^^^^^^^
- Some profilers will even tell you which part of a function (which data accesses,
- calculations) are slowing things down.
- As with design, you should concentrate your efforts first on making sure the
- algorithms and data structures are the best they can be. Data access should be
- local (to make best use of CPU cache), and it can often be better to use compact
- storage of data (again, always profile to test results). Often, you precalculate
- heavy computations ahead of time. This can be done by performing the computation
- when loading a level, by loading a file containing precalculated data or simply
- by storing the results of complex calculations into a script constant and
- reading its value.
- Once algorithms and data are good, you can often make small changes in routines
- which improve performance. For instance, you can move some calculations outside
- of loops or transform nested ``for`` loops into non-nested loops.
- (This should be feasible if you know a 2D array's width or height in advance.)
- Always retest your timing/bottlenecks after making each change. Some changes
- will increase speed, others may have a negative effect. Sometimes, a small
- positive effect will be outweighed by the negatives of more complex code, and
- you may choose to leave out that optimization.
- Appendix
- --------
- Bottleneck math
- ^^^^^^^^^^^^^^^
- The proverb *"a chain is only as strong as its weakest link"* applies directly to
- performance optimization. If your project is spending 90% of the time in
- function ``A``, then optimizing ``A`` can have a massive effect on performance.
- .. code-block:: none
- A: 9 ms
- Everything else: 1 ms
- Total frame time: 10 ms
- .. code-block:: none
- A: 1 ms
- Everything else: 1ms
- Total frame time: 2 ms
- In this example, improving this bottleneck ``A`` by a factor of 9× decreases
- overall frame time by 5× while increasing frames per second by 5×.
- However, if something else is running slowly and also bottlenecking your
- project, then the same improvement can lead to less dramatic gains:
- .. code-block:: none
- A: 9 ms
- Everything else: 50 ms
- Total frame time: 59 ms
- .. code-block:: none
- A: 1 ms
- Everything else: 50 ms
- Total frame time: 51 ms
- In this example, even though we have hugely optimized function ``A``,
- the actual gain in terms of frame rate is quite small.
- In games, things become even more complicated because the CPU and GPU run
- independently of one another. Your total frame time is determined by the slower
- of the two.
- .. code-block:: none
- CPU: 9 ms
- GPU: 50 ms
- Total frame time: 50 ms
- .. code-block:: none
- CPU: 1 ms
- GPU: 50 ms
- Total frame time: 50 ms
- In this example, we optimized the CPU hugely again, but the frame time didn't
- improve because we are GPU-bottlenecked.
|