batching.rst 22 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563
  1. .. _doc_batching:
  2. Optimization using batching
  3. ===========================
  4. Introduction
  5. ~~~~~~~~~~~~
  6. Game engines have to send a set of instructions to the GPU to tell the GPU what
  7. and where to draw. These instructions are sent using common instructions called
  8. :abbr:`APIs (Application Programming Interfaces)`. Examples of graphics APIs are
  9. OpenGL, OpenGL ES, and Vulkan.
  10. Different APIs incur different costs when drawing objects. OpenGL handles a lot
  11. of work for the user in the GPU driver at the cost of more expensive draw calls.
  12. As a result, applications can often be sped up by reducing the number of draw
  13. calls.
  14. Draw calls
  15. ^^^^^^^^^^
  16. In 2D, we need to tell the GPU to render a series of primitives (rectangles,
  17. lines, polygons etc). The most obvious technique is to tell the GPU to render
  18. one primitive at a time, telling it some information such as the texture used,
  19. the material, the position, size, etc. then saying "Draw!" (this is called a
  20. draw call).
  21. While this is conceptually simple from the engine side, GPUs operate very slowly
  22. when used in this manner. GPUs work much more efficiently if you tell them to
  23. draw a number of similar primitives all in one draw call, which we will call a
  24. "batch".
  25. It turns out that they don't just work a bit faster when used in this manner;
  26. they work a *lot* faster.
  27. As Godot is designed to be a general-purpose engine, the primitives coming into
  28. the Godot renderer can be in any order, sometimes similar, and sometimes
  29. dissimilar. To match Godot's general-purpose nature with the batching
  30. preferences of GPUs, Godot features an intermediate layer which can
  31. automatically group together primitives wherever possible and send these batches
  32. on to the GPU. This can give an increase in rendering performance while
  33. requiring few (if any) changes to your Godot project.
  34. How it works
  35. ~~~~~~~~~~~~
  36. Instructions come into the renderer from your game in the form of a series of
  37. items, each of which can contain one or more commands. The items correspond to
  38. Nodes in the scene tree, and the commands correspond to primitives such as
  39. rectangles or polygons. Some items such as TileMaps and text can contain a
  40. large number of commands (tiles and glyphs respectively). Others, such as
  41. sprites, may only contain a single command (a rectangle).
  42. The batcher uses two main techniques to group together primitives:
  43. - Consecutive items can be joined together.
  44. - Consecutive commands within an item can be joined to form a batch.
  45. Breaking batching
  46. ^^^^^^^^^^^^^^^^^
  47. Batching can only take place if the items or commands are similar enough to be
  48. rendered in one draw call. Certain changes (or techniques), by necessity, prevent
  49. the formation of a contiguous batch, this is referred to as "breaking batching".
  50. Batching will be broken by (amongst other things):
  51. - Change of texture.
  52. - Change of material.
  53. - Change of primitive type (say, going from rectangles to lines).
  54. .. note::
  55. For example, if you draw a series of sprites each with a different texture,
  56. there is no way they can be batched.
  57. Determining the rendering order
  58. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  59. The question arises, if only similar items can be drawn together in a batch, why
  60. don't we look through all the items in a scene, group together all the similar
  61. items, and draw them together?
  62. In 3D, this is often exactly how engines work. However, in Godot's 2D renderer,
  63. items are drawn in "painter's order", from back to front. This ensures that
  64. items at the front are drawn on top of earlier items when they overlap.
  65. This also means that if we try and draw objects on a per-texture basis, then
  66. this painter's order may break and objects will be drawn in the wrong order.
  67. In Godot, this back-to-front order is determined by:
  68. - The order of objects in the scene tree.
  69. - The Z index of objects.
  70. - The canvas layer.
  71. - :ref:`class_YSort` nodes.
  72. .. note::
  73. You can group similar objects together for easier batching. While doing so
  74. is not a requirement on your part, think of it as an optional approach that
  75. can improve performance in some cases. See the
  76. :ref:`doc_batching_diagnostics` section to help you make this decision.
  77. A trick
  78. ^^^^^^^
  79. And now, a sleight of hand. Even though the idea of painter's order is that
  80. objects are rendered from back to front, consider 3 objects ``A``, ``B`` and
  81. ``C``, that contain 2 different textures: grass and wood.
  82. .. image:: img/overlap1.png
  83. In painter's order they are ordered::
  84. A - wood
  85. B - grass
  86. C - wood
  87. Because of the texture changes, they can't be batched and will be rendered in 3
  88. draw calls.
  89. However, painter's order is only needed on the assumption that they will be
  90. drawn *on top* of each other. If we relax that assumption, i.e. if none of these
  91. 3 objects are overlapping, there is *no need* to preserve painter's order. The
  92. rendered result will be the same. What if we could take advantage of this?
  93. Item reordering
  94. ^^^^^^^^^^^^^^^
  95. .. image:: img/overlap2.png
  96. It turns out that we can reorder items. However, we can only do this if the
  97. items satisfy the conditions of an overlap test, to ensure that the end result
  98. will be the same as if they were not reordered. The overlap test is very cheap
  99. in performance terms, but not absolutely free, so there is a slight cost to
  100. looking ahead to decide whether items can be reordered. The number of items to
  101. lookahead for reordering can be set in project settings (see below), in order to
  102. balance the costs and benefits in your project.
  103. ::
  104. A - wood
  105. C - wood
  106. B - grass
  107. Since the texture only changes once, we can render the above in only 2 draw
  108. calls.
  109. Lights
  110. ~~~~~~
  111. Although the batching system's job is normally quite straightforward, it becomes
  112. considerably more complex when 2D lights are used. This is because lights are
  113. drawn using additional passes, one for each light affecting the primitive.
  114. Consider 2 sprites ``A`` and ``B``, with identical texture and material. Without
  115. lights, they would be batched together and drawn in one draw call. But with 3
  116. lights, they would be drawn as follows, each line being a draw call:
  117. .. image:: img/lights_overlap.png
  118. ::
  119. A
  120. A - light 1
  121. A - light 2
  122. A - light 3
  123. B
  124. B - light 1
  125. B - light 2
  126. B - light 3
  127. That is a lot of draw calls: 8 for only 2 sprites. Now, consider we are drawing
  128. 1,000 sprites. The number of draw calls quickly becomes astronomical and
  129. performance suffers. This is partly why lights have the potential to drastically
  130. slow down 2D rendering.
  131. However, if you remember our magician's trick from item reordering, it turns out
  132. we can use the same trick to get around painter's order for lights!
  133. If ``A`` and ``B`` are not overlapping, we can render them together in a batch,
  134. so the drawing process is as follows:
  135. .. image:: img/lights_separate.png
  136. ::
  137. AB
  138. AB - light 1
  139. AB - light 2
  140. AB - light 3
  141. That is only 4 draw calls. Not bad, as that is a 2× reduction. However, consider
  142. that in a real game, you might be drawing closer to 1,000 sprites.
  143. - **Before:** 1000 × 4 = 4,000 draw calls.
  144. - **After:** 1 × 4 = 4 draw calls.
  145. That is a 1000× decrease in draw calls, and should give a huge increase in
  146. performance.
  147. Overlap test
  148. ^^^^^^^^^^^^
  149. However, as with the item reordering, things are not that simple. We must first
  150. perform the overlap test to determine whether we can join these primitives. This
  151. overlap test has a small cost. Again, you can choose the number of primitives to
  152. lookahead in the overlap test to balance the benefits against the cost. With
  153. lights, the benefits usually far outweigh the costs.
  154. Also consider that depending on the arrangement of primitives in the viewport,
  155. the overlap test will sometimes fail (because the primitives overlap and
  156. therefore shouldn't be joined). In practice, the decrease in draw calls may be
  157. less dramatic than in a perfect situation with no overlapping at all. However,
  158. performance is usually far higher than without this lighting optimization.
  159. Light scissoring
  160. ~~~~~~~~~~~~~~~~
  161. Batching can make it more difficult to cull out objects that are not affected or
  162. partially affected by a light. This can increase the fill rate requirements
  163. quite a bit and slow down rendering. *Fill rate* is the rate at which pixels are
  164. colored. It is another potential bottleneck unrelated to draw calls.
  165. In order to counter this problem (and speed up lighting in general), batching
  166. introduces light scissoring. This enables the use of the OpenGL command
  167. ``glScissor()``, which identifies an area outside of which the GPU won't render
  168. any pixels. We can greatly optimize fill rate by identifying the intersection
  169. area between a light and a primitive, and limit rendering the light to
  170. *that area only*.
  171. Light scissoring is controlled with the :ref:`scissor_area_threshold
  172. <class_ProjectSettings_property_rendering/batching/lights/scissor_area_threshold>`
  173. project setting. This value is between 1.0 and 0.0, with 1.0 being off (no
  174. scissoring), and 0.0 being scissoring in every circumstance. The reason for the
  175. setting is that there may be some small cost to scissoring on some hardware.
  176. That said, scissoring should usually result in performance gains when you're
  177. using 2D lighting.
  178. The relationship between the threshold and whether a scissor operation takes
  179. place is not always straightforward. Generally, it represents the pixel area
  180. that is potentially "saved" by a scissor operation (i.e. the fill rate saved).
  181. At 1.0, the entire screen's pixels would need to be saved, which rarely (if
  182. ever) happens, so it is switched off. In practice, the useful values are close
  183. to 0.0, as only a small percentage of pixels need to be saved for the operation
  184. to be useful.
  185. The exact relationship is probably not necessary for users to worry about, but
  186. is included in the appendix out of interest:
  187. :ref:`doc_batching_light_scissoring_threshold_calculation`
  188. .. figure:: img/scissoring.png
  189. :alt: Light scissoring example diagram
  190. Bottom right is a light, the red area is the pixels saved by the scissoring
  191. operation. Only the intersection needs to be rendered.
  192. Vertex baking
  193. ~~~~~~~~~~~~~
  194. The GPU shader receives instructions on what to draw in 2 main ways:
  195. - Shader uniforms (e.g. modulate color, item transform).
  196. - Vertex attributes (vertex color, local transform).
  197. However, within a single draw call (batch), we cannot change uniforms. This
  198. means that naively, we would not be able to batch together items or commands
  199. that change ``final_modulate`` or an item's transform. Unfortunately, that
  200. happens in an awful lot of cases. For instance, sprites are typically
  201. individual nodes with their own item transform, and they may have their own
  202. color modulate as well.
  203. To get around this problem, the batching can "bake" some of the uniforms into
  204. the vertex attributes.
  205. - The item transform can be combined with the local transform and sent in a
  206. vertex attribute.
  207. - The final modulate color can be combined with the vertex colors, and sent in a
  208. vertex attribute.
  209. In most cases, this works fine, but this shortcut breaks down if a shader expects
  210. these values to be available individually rather than combined. This can happen
  211. in custom shaders.
  212. Custom shaders
  213. ^^^^^^^^^^^^^^
  214. As a result of the limitation described above, certain operations in custom
  215. shaders will prevent vertex baking and therefore decrease the potential for
  216. batching. While we are working to decrease these cases, the following caveats
  217. currently apply:
  218. - Reading or writing ``COLOR`` or ``MODULATE`` disables vertex color baking.
  219. - Reading ``VERTEX`` disables vertex position baking.
  220. Project Settings
  221. ~~~~~~~~~~~~~~~~
  222. To fine-tune batching, a number of project settings are available. You can
  223. usually leave these at default during development, but it's a good idea to
  224. experiment to ensure you are getting maximum performance. Spending a little time
  225. tweaking parameters can often give considerable performance gains for very
  226. little effort. See the on-hover tooltips in the Project Settings for more
  227. information.
  228. rendering/batching/options
  229. ^^^^^^^^^^^^^^^^^^^^^^^^^^
  230. - :ref:`use_batching
  231. <class_ProjectSettings_property_rendering/batching/options/use_batching>` -
  232. Turns batching on or off.
  233. - :ref:`use_batching_in_editor
  234. <class_ProjectSettings_property_rendering/batching/options/use_batching_in_editor>`
  235. Turns batching on or off in the Godot editor.
  236. This setting doesn't affect the running project in any way.
  237. - :ref:`single_rect_fallback
  238. <class_ProjectSettings_property_rendering/batching/options/single_rect_fallback>` -
  239. This is a faster way of drawing unbatchable rectangles. However, it may lead
  240. to flicker on some hardware so it's not recommended.
  241. rendering/batching/parameters
  242. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  243. - :ref:`max_join_item_commands <class_ProjectSettings_property_rendering/batching/parameters/max_join_item_commands>` -
  244. One of the most important ways of achieving batching is to join suitable
  245. adjacent items (nodes) together, however they can only be joined if the
  246. commands they contain are compatible. The system must therefore do a lookahead
  247. through the commands in an item to determine whether it can be joined. This
  248. has a small cost per command, and items with a large number of commands are
  249. not worth joining, so the best value may be project dependent.
  250. - :ref:`colored_vertex_format_threshold
  251. <class_ProjectSettings_property_rendering/batching/parameters/colored_vertex_format_threshold>` -
  252. Baking colors into vertices results in a larger vertex format. This is not
  253. necessarily worth doing unless there are a lot of color changes going on
  254. within a joined item. This parameter represents the proportion of commands
  255. containing color changes / the total commands, above which it switches to
  256. baked colors.
  257. - :ref:`batch_buffer_size
  258. <class_ProjectSettings_property_rendering/batching/parameters/batch_buffer_size>` -
  259. This determines the maximum size of a batch, it doesn't have a huge effect
  260. on performance but can be worth decreasing for mobile if RAM is at a premium.
  261. - :ref:`item_reordering_lookahead
  262. <class_ProjectSettings_property_rendering/batching/parameters/item_reordering_lookahead>` -
  263. Item reordering can help especially with interleaved sprites using different
  264. textures. The lookahead for the overlap test has a small cost, so the best
  265. value may change per project.
  266. rendering/batching/lights
  267. ^^^^^^^^^^^^^^^^^^^^^^^^^
  268. - :ref:`scissor_area_threshold
  269. <class_ProjectSettings_property_rendering/batching/lights/scissor_area_threshold>` -
  270. See light scissoring.
  271. - :ref:`max_join_items
  272. <class_ProjectSettings_property_rendering/batching/lights/max_join_items>` -
  273. Joining items before lighting can significantly increase
  274. performance. This requires an overlap test, which has a small cost, so the
  275. costs and benefits may be project dependent, and hence the best value to use
  276. here.
  277. rendering/batching/debug
  278. ^^^^^^^^^^^^^^^^^^^^^^^^
  279. - :ref:`flash_batching
  280. <class_ProjectSettings_property_rendering/batching/debug/flash_batching>` -
  281. This is purely a debugging feature to identify regressions between the
  282. batching and legacy renderer. When it is switched on, the batching and legacy
  283. renderer are used alternately on each frame. This will decrease performance,
  284. and should not be used for your final export, only for testing.
  285. - :ref:`diagnose_frame
  286. <class_ProjectSettings_property_rendering/batching/debug/diagnose_frame>` -
  287. This will periodically print a diagnostic batching log to
  288. the Godot IDE / console.
  289. rendering/batching/precision
  290. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  291. - :ref:`uv_contract
  292. <class_ProjectSettings_property_rendering/batching/precision/uv_contract>` -
  293. On some hardware (notably some Android devices) there have been reports of
  294. tilemap tiles drawing slightly outside their UV range, leading to edge
  295. artifacts such as lines around tiles. If you see this problem, try enabling uv
  296. contract. This makes a small contraction in the UV coordinates to compensate
  297. for precision errors on devices.
  298. - :ref:`uv_contract_amount
  299. <class_ProjectSettings_property_rendering/batching/precision/uv_contract_amount>` -
  300. Hopefully, the default amount should cure artifacts on most devices,
  301. but this value remains adjustable just in case.
  302. .. _doc_batching_diagnostics:
  303. Diagnostics
  304. ~~~~~~~~~~~
  305. Although you can change parameters and examine the effect on frame rate, this
  306. can feel like working blindly, with no idea of what is going on under the hood.
  307. To help with this, batching offers a diagnostic mode, which will periodically
  308. print out (to the IDE or console) a list of the batches that are being
  309. processed. This can help pinpoint situations where batching isn't occurring
  310. as intended, and help you fix these situations to get the best possible performance.
  311. Reading a diagnostic
  312. ^^^^^^^^^^^^^^^^^^^^
  313. .. code-block:: cpp
  314. canvas_begin FRAME 2604
  315. items
  316. joined_item 1 refs
  317. batch D 0-0
  318. batch D 0-2 n n
  319. batch R 0-1 [0 - 0] {255 255 255 255 }
  320. joined_item 1 refs
  321. batch D 0-0
  322. batch R 0-1 [0 - 146] {255 255 255 255 }
  323. batch D 0-0
  324. batch R 0-1 [0 - 146] {255 255 255 255 }
  325. joined_item 1 refs
  326. batch D 0-0
  327. batch R 0-2560 [0 - 144] {158 193 0 104 } MULTI
  328. batch D 0-0
  329. batch R 0-2560 [0 - 144] {158 193 0 104 } MULTI
  330. batch D 0-0
  331. batch R 0-2560 [0 - 144] {158 193 0 104 } MULTI
  332. canvas_end
  333. This is a typical diagnostic.
  334. - **joined_item:** A joined item can contain 1 or
  335. more references to items (nodes). Generally, joined_items containing many
  336. references is preferable to many joined_items containing a single reference.
  337. Whether items can be joined will be determined by their contents and
  338. compatibility with the previous item.
  339. - **batch R:** A batch containing rectangles. The second number is the number of
  340. rects. The second number in square brackets is the Godot texture ID, and the
  341. numbers in curly braces is the color. If the batch contains more than one rect,
  342. ``MULTI`` is added to the line to make it easy to identify.
  343. Seeing ``MULTI`` is good as it indicates successful batching.
  344. - **batch D:** A default batch, containing everything else that is not currently
  345. batched.
  346. Default batches
  347. ^^^^^^^^^^^^^^^
  348. The second number following default batches is the number of commands in the
  349. batch, and it is followed by a brief summary of the contents::
  350. l - line
  351. PL - polyline
  352. r - rect
  353. n - ninepatch
  354. PR - primitive
  355. p - polygon
  356. m - mesh
  357. MM - multimesh
  358. PA - particles
  359. c - circle
  360. t - transform
  361. CI - clip_ignore
  362. You may see "dummy" default batches containing no commands; you can ignore those.
  363. Frequently asked questions
  364. ~~~~~~~~~~~~~~~~~~~~~~~~~~
  365. I don't get a large performance increase when enabling batching.
  366. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  367. - Try the diagnostics, see how much batching is occurring, and whether it can be
  368. improved
  369. - Try changing batching parameters in the Project Settings.
  370. - Consider that batching may not be your bottleneck (see bottlenecks).
  371. I get a decrease in performance with batching.
  372. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  373. - Try the steps described above to increase the number of batching opportunities.
  374. - Try enabling :ref:`single_rect_fallback
  375. <class_ProjectSettings_property_rendering/batching/options/single_rect_fallback>`.
  376. - The single rect fallback method is the default used without batching, and it
  377. is approximately twice as fast. However, it can result in flickering on some
  378. hardware, so its use is discouraged.
  379. - After trying the above, if your scene is still performing worse, consider
  380. turning off batching.
  381. I use custom shaders and the items are not batching.
  382. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  383. - Custom shaders can be problematic for batching, see the custom shaders section
  384. I am seeing line artifacts appear on certain hardware.
  385. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  386. - See the :ref:`uv_contract
  387. <class_ProjectSettings_property_rendering/batching/precision/uv_contract>`
  388. project setting which can be used to solve this problem.
  389. I use a large number of textures, so few items are being batched.
  390. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  391. - Consider using texture atlases. As well as allowing batching, these
  392. reduce the need for state changes associated with changing textures.
  393. Appendix
  394. ~~~~~~~~
  395. Batched primitives
  396. ^^^^^^^^^^^^^^^^^^
  397. Not all primitives can be batched. Batching is not guaranteed either,
  398. especially with primitives using an antialiased border. The following
  399. primitive types are currently available:
  400. - RECT
  401. - NINEPATCH (depending on wrapping mode)
  402. - POLY
  403. - LINE
  404. With non-batched primitives, you may be able to get better performance by
  405. drawing them manually with polys in a ``_draw()`` function.
  406. See :ref:`doc_custom_drawing_in_2d` for more information.
  407. .. _doc_batching_light_scissoring_threshold_calculation:
  408. Light scissoring threshold calculation
  409. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  410. The actual proportion of screen pixel area used as the threshold is the
  411. :ref:`scissor_area_threshold
  412. <class_ProjectSettings_property_rendering/batching/lights/scissor_area_threshold>`
  413. value to the power of 4.
  414. For example, on a screen size of 1920×1080, there are 2,073,600 pixels.
  415. At a threshold of 1,000 pixels, the proportion would be::
  416. 1000 / 2073600 = 0.00048225
  417. 0.00048225 ^ (1/4) = 0.14819
  418. So a :ref:`scissor_area_threshold
  419. <class_ProjectSettings_property_rendering/batching/lights/scissor_area_threshold>`
  420. of ``0.15`` would be a reasonable value to try.
  421. Going the other way, for instance with a :ref:`scissor_area_threshold
  422. <class_ProjectSettings_property_rendering/batching/lights/scissor_area_threshold>`
  423. of ``0.5``::
  424. 0.5 ^ 4 = 0.0625
  425. 0.0625 * 2073600 = 129600 pixels
  426. If the number of pixels saved is greater than this threshold, the scissor is
  427. activated.