text_to_speech.rst 6.7 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116
  1. .. _doc_text_to_speech:
  2. Text to speech
  3. ==============
  4. Basic Usage
  5. -----------
  6. Basic usage of text-to-speech involves the following one-time steps:
  7. - Enable TTS in the Godot editor for your project
  8. - Query the system for a list of usable voices
  9. - Store the ID of the voice you want to use
  10. By default, the Godot project-level setting for text-to-speech is disabled, to avoid unnecessary overhead. To enable it:
  11. - Go to **Project > Project Settings**
  12. - Make sure the **Advanced Settings** toggle is enabled
  13. - Click on **Audio > General**
  14. - Ensure the **Text to Speech** option is checked
  15. - Restart Godot if prompted to do so.
  16. Text-to-speech uses a specific voice. Depending on the user's system, they might have multiple voices installed. Once you have the voice ID, you can use it to speak some text:
  17. .. tabs::
  18. .. code-tab:: gdscript GDScript
  19. # One-time steps.
  20. # Pick a voice. Here, we arbitrarily pick the first English voice.
  21. var voices = DisplayServer.tts_get_voices_for_language("en")
  22. var voice_id = voices[0]
  23. # Say "Hello, world!".
  24. DisplayServer.tts_speak("Hello, world!", voice_id)
  25. # Say a longer sentence, and then interrupt it.
  26. # Note that this method is asynchronous: execution proceeds to the next line immediately,
  27. # before the voice finishes speaking.
  28. var long_message = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur"
  29. DisplayServer.tts_speak(long_message, voice_id)
  30. # Immediately stop the current text mid-sentence and say goodbye instead.
  31. DisplayServer.tts_stop()
  32. DisplayServer.tts_speak("Goodbye!", voice_id)
  33. .. code-tab:: csharp
  34. // One-time steps.
  35. // Pick a voice. Here, we arbitrarily pick the first English voice.
  36. string[] voices = DisplayServer.TtsGetVoicesForLanguage("en");
  37. string voiceId = voices[0];
  38. // Say "Hello, world!".
  39. DisplayServer.TtsSpeak("Hello, world!", voiceId);
  40. // Say a longer sentence, and then interrupt it.
  41. // Note that this method is asynchronous: execution proceeds to the next line immediately,
  42. // before the voice finishes speaking.
  43. string longMessage = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur";
  44. DisplayServer.TtsSpeak(longMessage, voiceId);
  45. // Immediately stop the current text mid-sentence and say goodbye instead.
  46. DisplayServer.TtsStop();
  47. DisplayServer.TtsSpeak("Goodbye!", voiceId);
  48. Requirements for functionality
  49. ------------------------------
  50. Godot includes text-to-speech functionality. You can find these under the :ref:`DisplayServer class <class_DisplayServer>`.
  51. Godot depends on system libraries for text-to-speech functionality. These libraries are installed by default on Windows and macOS, but not on all Linux distributions. If they are not present, text-to-speech functionality will not work. Specifically, the ``tts_get_voices()`` method will return an empty list, indicating that there are no usable voices.
  52. Both Godot users on Linux and end-users on Linux running Godot games need to ensure that their system includes the system libraries for text-to-speech to work. Please consult the table below or your own distribution's documentation to determine what libraries you need to install.
  53. Distro-specific one-liners
  54. ~~~~~~~~~~~~~~~~~~~~~~~~~~
  55. +------------------+-----------------------------------------------------------------------------------------------------------+
  56. | **Arch Linux** | :: |
  57. | | |
  58. | | pacman -S speech-dispatcher festival espeakup |
  59. +------------------+-----------------------------------------------------------------------------------------------------------+
  60. Troubleshooting
  61. ---------------
  62. If you get the error `Invalid get index '0' (on base: 'PackedStringArray').` for the line `var voice_id = voices[0]`, check if there are any items in `voices`. If not:
  63. - All users: make sure you enabled **Text to Speech** in project settings
  64. - Linux users: ensure you installed the system-specific libraries for text to speech
  65. Best practices
  66. --------------
  67. The best practices for text-to-speech, in terms of the ideal player experience for blind players, is to send output to the player's screen reader. This preserves the choice of language, speed, pitch, etc. that the user set, as well as allows advanced features like allowing players to scroll backward and forward through text. As of now, Godot doesn't provide this level of integration.
  68. With the current state of the Godot text-to-speech APIs, best practices include:
  69. - Develop the game with text-to-speech enabled, and ensure that everything sounds correct
  70. - Allow players to control which voice to use, and save/persist that selection across game sessions
  71. - Allow players to control the speech rate, and save/persist that selection across game sessions
  72. This provides your blind players with the most flexibility and comfort available when not using a screen reader, and minimizes the chance of frustrating and alienating them.
  73. Caveats and Other Information
  74. -----------------------------
  75. - Expect delays when you call `tts_speak` and `tts_stop`. The actual delay time varies depending on both the OS and on your machine's specifications. This is especially critical on Android and Web, where some of the voices depend on web services, and the actual time to playback depends on server load, network latency, and other factors.
  76. - Non-English text works if the correct voices are installed and used. On Windows, you can consult the instructions in `this article`_ to enable additional language voices on Windows.
  77. - Non-ASCII characters, such as umlaut, are pronounced correctly if you select the correct voice.
  78. - Blind players use a number of screen readers, including JAWS, NVDA, VoiceOver, Narrator, and more.
  79. - Windows text-to-speech APIs generally perform better than their equivalents on other systems (e.g. `tts_stop` followed by `tts_speak` immediately speaks the new message).
  80. .. _this article: https://www.ghacks.net/2018/08/11/unlock-all-windows-10-tts-voices-system-wide-to-get-more-of-them/