Unicode.xhtml 18 KB


  1. <?xml version="1.0" encoding="UTF-8" standalone="no"?>
  2. <!DOCTYPE html><html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:pls="http://www.w3.org/2005/01/pronunciation-lexicon" xmlns:ssml="http://www.w3.org/2001/10/synthesis" xmlns:svg="http://www.w3.org/2000/svg">
  3. <head>
  4. <title>Unicode character classes and conversions</title>
  5. <link rel="stylesheet" type="text/css" href="docbook-epub.css"/>
  6. <link rel="stylesheet" type="text/css" href="kawa.css"/>
  7. <script src="kawa-ebook.js" type="text/javascript"/>
  8. <meta name="generator" content="DocBook XSL-NS Stylesheets V1.79.1"/>
  9. <link rel="prev" href="Overall-Index.xhtml" title="Index"/>
  10. <link rel="next" href="Regular-expressions.xhtml" title="Regular expressions"/>
  11. </head>
  12. <body>
  13. <header/>
  14. <section class="sect1" title="Unicode character classes and conversions" epub:type="subchapter" id="Unicode">
  15. <div class="titlepage">
  16. <div>
  17. <div>
  18. <h2 class="title" style="clear: both">Unicode character classes and conversions</h2>
  19. </div>
  20. </div>
  21. </div>
  22. <p>Some of the procedures that operate on characters or strings ignore the
  23. difference between upper case and lower case. These procedures have
  24. <code class="literal">-ci</code> (for “case insensitive”) embedded in their names.
  25. </p>
  26. <section class="sect2" title="Characters" epub:type="division" id="idm139667874766096">
  27. <div class="titlepage">
  28. <div>
  29. <div>
  30. <h3 class="title">Characters</h3>
  31. </div>
  32. </div>
  33. </div>
  34. <p class="synopsis" kind="Procedure"><span class="kind">Procedure</span><span class="ignore">: </span><a id="idm139667874765024" class="indexterm"/> <code class="function">char-upcase</code> <em class="replaceable"><code><em class="replaceable"><code>char</code></em></code></em></p>
  35. <p class="synopsis" kind="Procedure"><span class="kind">Procedure</span><span class="ignore">: </span><a id="idm139667874762064" class="indexterm"/> <code class="function">char-downcase</code> <em class="replaceable"><code><em class="replaceable"><code>char</code></em></code></em></p>
  36. <p class="synopsis" kind="Procedure"><span class="kind">Procedure</span><span class="ignore">: </span><a id="idm139667874759104" class="indexterm"/> <code class="function">char-titlecase</code> <em class="replaceable"><code><em class="replaceable"><code>char</code></em></code></em></p>
  37. <p class="synopsis" kind="Procedure"><span class="kind">Procedure</span><span class="ignore">: </span><a id="idm139667874756144" class="indexterm"/> <code class="function">char-foldcase</code> <em class="replaceable"><code><em class="replaceable"><code>char</code></em></code></em></p>
  38. <div class="blockquote">
  39. <blockquote class="blockquote">
  40. <p>These procedures take a character argument and return a character
  41. result.
  42. </p>
  43. <p>If the argument is an upper–case or title–case character, and if there
  44. is a single character that is its lower–case form, then
  45. <code class="literal">char-downcase</code> returns that character.
  46. </p>
  47. <p>If the argument is a lower–case or title–case character, and there is
  48. a single character that is its upper–case form, then <code class="literal">char-upcase</code>
  49. returns that character.
  50. </p>
  51. <p>If the argument is a lower–case or upper–case character, and there is
  52. a single character that is its title–case form, then
  53. <code class="literal">char-titlecase</code> returns that character.
  54. </p>
  55. <p>If the argument is not a title–case character and there is no single
  56. character that is its title–case form, then <code class="literal">char-titlecase</code>
  57. returns the upper–case form of the argument.
  58. </p>
  59. <p>Finally, if the character has a case–folded character, then
  60. <code class="literal">char-foldcase</code> returns that character. Otherwise the character
  61. returned is the same as the argument.
  62. </p>
  63. <p>For Turkic characters <code class="literal">#\x130</code> and <code class="literal">#\x131</code>,
  64. <code class="literal">char-foldcase</code> behaves as the identity function; otherwise
  65. <code class="literal">char-foldcase</code> is the same as <code class="literal">char-downcase</code> composed with
  66. <code class="literal">char-upcase</code>.
  67. </p>
  68. <pre class="screen">(char-upcase #\i) ⇒ #\I
  69. (char-downcase #\i) ⇒ #\i
  70. (char-titlecase #\i) ⇒ #\I
  71. (char-foldcase #\i) ⇒ #\i
  72. (char-upcase #\ß) ⇒ #\ß
  73. (char-downcase #\ß) ⇒ #\ß
  74. (char-titlecase #\ß) ⇒ #\ß
  75. (char-foldcase #\ß) ⇒ #\ß
  76. (char-upcase #\Σ) ⇒ #\Σ
  77. (char-downcase #\Σ) ⇒ #\σ
  78. (char-titlecase #\Σ) ⇒ #\Σ
  79. (char-foldcase #\Σ) ⇒ #\σ
  80. (char-upcase #\ς) ⇒ #\Σ
  81. (char-downcase #\ς) ⇒ #\ς
  82. (char-titlecase #\ς) ⇒ #\Σ
  83. (char-foldcase #\ς) ⇒ #\σ
  84. </pre>
  85. <div class="blockquote">
  86. <blockquote class="blockquote">
  87. <p><span class="emphasis"><em>Note:</em></span> <code class="literal">char-titlecase</code> does not always return a title–case
  88. character.
  89. </p>
  90. </blockquote>
  91. </div>
  92. <div class="blockquote">
  93. <blockquote class="blockquote">
  94. <p><span class="emphasis"><em>Note:</em></span> These procedures are consistent with Unicode’s
  95. locale–independent mappings from scalar values to scalar values for
  96. upcase, downcase, titlecase, and case–folding operations. These
  97. mappings can be extracted from <code class="filename">UnicodeData.txt</code> and
  98. <code class="filename">CaseFolding.txt</code> from the Unicode Consortium, ignoring Turkic
  99. mappings in the latter.
  100. </p>
  101. <p>Note that these character–based procedures are an incomplete
  102. approximation to case conversion, even ignoring the user’s locale. In
  103. general, case mappings require the context of a string, both in
  104. arguments and in result. The <code class="literal">string-upcase</code>,
  105. <code class="literal">string-downcase</code>, <code class="literal">string-titlecase</code>, and
  106. <code class="literal">string-foldcase</code> procedures perform more general case conversion.
  107. </p>
  108. </blockquote>
  109. </div>
  110. </blockquote>
  111. </div>
  112. <p class="synopsis" kind="Procedure"><span class="kind">Procedure</span><span class="ignore">: </span><a id="idm139667874736480" class="indexterm"/> <code class="function">char-ci=?</code> <em class="replaceable"><code><em class="replaceable"><code>char</code></em></code></em><em class="replaceable"><code><sub>1</sub></code></em> <em class="replaceable"><code><em class="replaceable"><code>char</code></em></code></em><em class="replaceable"><code><sub>2</sub></code></em> <em class="replaceable"><code><em class="replaceable"><code>char</code></em></code></em><em class="replaceable"><code><sub>3</sub></code></em> <em class="replaceable"><code>…</code></em></p>
  113. <p class="synopsis" kind="Procedure"><span class="kind">Procedure</span><span class="ignore">: </span><a id="idm139667874730864" class="indexterm"/> <code class="function">char-ci&lt;?</code> <em class="replaceable"><code><em class="replaceable"><code>char</code></em></code></em><em class="replaceable"><code><sub>1</sub></code></em> <em class="replaceable"><code><em class="replaceable"><code>char</code></em></code></em><em class="replaceable"><code><sub>2</sub></code></em> <em class="replaceable"><code><em class="replaceable"><code>char</code></em></code></em><em class="replaceable"><code><sub>3</sub></code></em> <em class="replaceable"><code>…</code></em></p>
  114. <p class="synopsis" kind="Procedure"><span class="kind">Procedure</span><span class="ignore">: </span><a id="idm139667874725248" class="indexterm"/> <code class="function">char-ci&gt;?</code> <em class="replaceable"><code><em class="replaceable"><code>char</code></em></code></em><em class="replaceable"><code><sub>1</sub></code></em> <em class="replaceable"><code><em class="replaceable"><code>char</code></em></code></em><em class="replaceable"><code><sub>2</sub></code></em> <em class="replaceable"><code><em class="replaceable"><code>char</code></em></code></em><em class="replaceable"><code><sub>3</sub></code></em> <em class="replaceable"><code>…</code></em></p>
  115. <p class="synopsis" kind="Procedure"><span class="kind">Procedure</span><span class="ignore">: </span><a id="idm139667874719632" class="indexterm"/> <code class="function">char-ci&lt;=?</code> <em class="replaceable"><code><em class="replaceable"><code>char</code></em></code></em><em class="replaceable"><code><sub>1</sub></code></em> <em class="replaceable"><code><em class="replaceable"><code>char</code></em></code></em><em class="replaceable"><code><sub>2</sub></code></em> <em class="replaceable"><code><em class="replaceable"><code>char</code></em></code></em><em class="replaceable"><code><sub>3</sub></code></em> <em class="replaceable"><code>…</code></em></p>
  116. <p class="synopsis" kind="Procedure"><span class="kind">Procedure</span><span class="ignore">: </span><a id="idm139667874714016" class="indexterm"/> <code class="function">char-ci&gt;=?</code> <em class="replaceable"><code><em class="replaceable"><code>char</code></em></code></em><em class="replaceable"><code><sub>1</sub></code></em> <em class="replaceable"><code><em class="replaceable"><code>char</code></em></code></em><em class="replaceable"><code><sub>2</sub></code></em> <em class="replaceable"><code><em class="replaceable"><code>char</code></em></code></em><em class="replaceable"><code><sub>3</sub></code></em> <em class="replaceable"><code>…</code></em></p>
  117. <div class="blockquote">
  118. <blockquote class="blockquote">
  119. <p>These procedures are similar to <code class="literal">char=?</code>, etc., but operate on the
  120. case–folded versions of the characters.
  121. </p>
  122. <pre class="screen">(char-ci&lt;? #\z #\Z) ⇒ #f
  123. (char-ci=? #\z #\Z) ⇒ #f
  124. (char-ci=? #\ς #\σ) ⇒ #t
  125. </pre>
  126. </blockquote>
  127. </div>
  128. <p class="synopsis" kind="Procedure"><span class="kind">Procedure</span><span class="ignore">: </span><a id="idm139667874706736" class="indexterm"/> <code class="function">char-alphabetic?</code> <em class="replaceable"><code><em class="replaceable"><code>char</code></em></code></em></p>
  129. <p class="synopsis" kind="Procedure"><span class="kind">Procedure</span><span class="ignore">: </span><a id="idm139667874703776" class="indexterm"/> <code class="function">char-numeric?</code> <em class="replaceable"><code><em class="replaceable"><code>char</code></em></code></em></p>
  130. <p class="synopsis" kind="Procedure"><span class="kind">Procedure</span><span class="ignore">: </span><a id="idm139667874700816" class="indexterm"/> <code class="function">char-whitespace?</code> <em class="replaceable"><code><em class="replaceable"><code>char</code></em></code></em></p>
  131. <p class="synopsis" kind="Procedure"><span class="kind">Procedure</span><span class="ignore">: </span><a id="idm139667874697856" class="indexterm"/> <code class="function">char-upper-case?</code> <em class="replaceable"><code><em class="replaceable"><code>char</code></em></code></em></p>
  132. <p class="synopsis" kind="Procedure"><span class="kind">Procedure</span><span class="ignore">: </span><a id="idm139667874694896" class="indexterm"/> <code class="function">char-lower-case?</code> <em class="replaceable"><code><em class="replaceable"><code>char</code></em></code></em></p>
  133. <p class="synopsis" kind="Procedure"><span class="kind">Procedure</span><span class="ignore">: </span><a id="idm139667874691936" class="indexterm"/> <code class="function">char-title-case?</code> <em class="replaceable"><code><em class="replaceable"><code>char</code></em></code></em></p>
  134. <div class="blockquote">
  135. <blockquote class="blockquote">
  136. <p>These procedures return <code class="literal">#t</code> if their arguments are alphabetic,
  137. numeric, whitespace, upper–case, lower–case, or title–case
  138. characters, respectively; otherwise they return <code class="literal">#f</code>.
  139. </p>
  140. <p>A character is alphabetic if it has the Unicode “Alphabetic” property.
  141. A character is numeric if it has the Unicode “Numeric” property. A
  142. character is whitespace if has the Unicode “White_Space” property. A
  143. character is upper case if it has the Unicode “Uppercase” property,
  144. lower case if it has the “Lowercase” property, and title case if it is
  145. in the Lt general category.
  146. </p>
  147. <pre class="screen">(char-alphabetic? #\a) ⇒ #t
  148. (char-numeric? #\1) ⇒ #t
  149. (char-whitespace? #\space) ⇒ #t
  150. (char-whitespace? #\x00A0) ⇒ #t
  151. (char-upper-case? #\Σ) ⇒ #t
  152. (char-lower-case? #\σ) ⇒ #t
  153. (char-lower-case? #\x00AA) ⇒ #t
  154. (char-title-case? #\I) ⇒ #f
  155. (char-title-case? #\x01C5) ⇒ #t
  156. </pre>
  157. </blockquote>
  158. </div>
  159. <p class="synopsis" kind="Procedure"><span class="kind">Procedure</span><span class="ignore">: </span><a id="idm139667874685344" class="indexterm"/> <code class="function">char-general-category</code> <em class="replaceable"><code><em class="replaceable"><code>char</code></em></code></em></p>
  160. <div class="blockquote">
  161. <blockquote class="blockquote">
  162. <p>Return a symbol representing the Unicode general category of
  163. <em class="replaceable"><code>char</code></em>, one of <code class="literal">Lu</code>, <code class="literal">Ll</code>, <code class="literal">Lt</code>, <code class="literal">Lm</code>,
  164. <code class="literal">Lo</code>, <code class="literal">Mn</code>, <code class="literal">Mc</code>, <code class="literal">Me</code>, <code class="literal">Nd</code>, <code class="literal">Nl</code>,
  165. <code class="literal">No</code>, <code class="literal">Ps</code>, <code class="literal">Pe</code>, <code class="literal">Pi</code>, <code class="literal">Pf</code>, <code class="literal">Pd</code>,
  166. <code class="literal">Pc</code>, <code class="literal">Po</code>, <code class="literal">Sc</code>, <code class="literal">Sm</code>, <code class="literal">Sk</code>, <code class="literal">So</code>,
  167. <code class="literal">Zs</code>, <code class="literal">Zp</code>, <code class="literal">Zl</code>, <code class="literal">Cc</code>, <code class="literal">Cf</code>, <code class="literal">Cs</code>,
  168. <code class="literal">Co</code>, or <code class="literal">Cn</code>.
  169. </p>
  170. <pre class="screen">(char-general-category #\a) ⇒ Ll
  171. (char-general-category #\space) ⇒ Zs
  172. (char-general-category #\x10FFFF) ⇒ Cn
  173. </pre>
  174. </blockquote>
  175. </div>
  176. </section>
  177. <section class="sect2" title="Deprecated in-place case modification" epub:type="division" id="idm139667874668368">
  178. <div class="titlepage">
  179. <div>
  180. <div>
  181. <h3 class="title">Deprecated in-place case modification</h3>
  182. </div>
  183. </div>
  184. </div>
  185. <p>The following functions are deprecated; they really don’t
  186. and cannot do the right thing, because in some languages
  187. upper and lower case can use different number of characters.
  188. </p>
  189. <p class="synopsis" kind="Procedure"><span class="kind">Procedure</span><span class="ignore">: </span><a id="idm139667874666400" class="indexterm"/> <code class="function">string-upcase!</code> <em class="replaceable"><code>str</code></em></p>
  190. <div class="blockquote">
  191. <blockquote class="blockquote">
  192. <p><span class="emphasis"><em>Deprecated:</em></span> Destructively modify <em class="replaceable"><code>str</code></em>, replacing the letters
  193. by their upper-case equivalents.
  194. </p>
  195. </blockquote>
  196. </div>
  197. <p class="synopsis" kind="Procedure"><span class="kind">Procedure</span><span class="ignore">: </span><a id="idm139667874662304" class="indexterm"/> <code class="function">string-downcase!</code> <em class="replaceable"><code>str</code></em></p>
  198. <div class="blockquote">
  199. <blockquote class="blockquote">
  200. <p><span class="emphasis"><em>Deprecated:</em></span> Destructively modify <em class="replaceable"><code>str</code></em>, replacing the letters
  201. by their upper-lower equivalents.
  202. </p>
  203. </blockquote>
  204. </div>
  205. <p class="synopsis" kind="Procedure"><span class="kind">Procedure</span><span class="ignore">: </span><a id="idm139667874658208" class="indexterm"/> <code class="function">string-capitalize!</code> <em class="replaceable"><code>str</code></em></p>
  206. <div class="blockquote">
  207. <blockquote class="blockquote">
  208. <p><span class="emphasis"><em>Deprecated:</em></span> Destructively modify <em class="replaceable"><code>str</code></em>, such that the letters that start a new word
  209. are replaced by their title-case equivalents, while non-initial letters
  210. are replaced by their lower-case equivalents.
  211. </p>
  212. </blockquote>
  213. </div>
  214. </section>
  215. </section>
  216. <footer>
  217. <div class="navfooter">
  218. <ul>
  219. <li>
  220. <b class="toc">
  221. <a href="Unicode.xhtml#idm139667874766096">Characters</a>
  222. </b>
  223. </li>
  224. <li>
  225. <b class="toc">
  226. <a href="Unicode.xhtml#idm139667874668368">Deprecated in-place case modification</a>
  227. </b>
  228. </li>
  229. </ul>
  230. <p>
  231. Up: <a accesskey="u" href="Characters-and-text.xhtml">Characters and text</a></p>
  232. <p>
  233. Previous: <a accesskey="p" href="String-literals.xhtml">String literals</a></p>
  234. <p>
  235. Next: <a accesskey="n" href="Regular-expressions.xhtml">Regular expressions</a></p>
  236. </div>
  237. </footer>
  238. </body>
  239. </html>