mbyte.txt 57 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991001011021031041051061071081091101111121131141151161171181191201211221231241251261271281291301311321331341351361371381391401411421431441451461471481491501511521531541551561571581591601611621631641651661671681691701711721731741751761771781791801811821831841851861871881891901911921931941951961971981992002012022032042052062072082092102112122132142152162172182192202212222232242252262272282292302312322332342352362372382392402412422432442452462472482492502512522532542552562572582592602612622632642652662672682692702712722732742752762772782792802812822832842852862872882892902912922932942952962972982993003013023033043053063073083093103113123133143153163173183193203213223233243253263273283293303313323333343353363373383393403413423433443453463473483493503513523533543553563573583593603613623633643653663673683693703713723733743753763773783793803813823833843853863873883893903913923933943953963973983994004014024034044054064074084094104114124134144154164174184194204214224234244254264274284294304314324334344354364374384394404414424434444454464474484494504514524534544554564574584594604614624634644654664674684694704714724734744754764774784794804814824834844854864874884894904914924934944954964974984995005015025035045055065075085095105115125135145155165175185195205215225235245255265275285295305315325335345355365375385395405415425435445455465475485495505515525535545555565575585595605615625635645655665675685695705715725735745755765775785795805815825835845855865875885895905915925935945955965975985996006016026036046056066076086096106116126136146156166176186196206216226236246256266276286296306316326336346356366376386396406416426436446456466476486496506516526536546556566576586596606616626636646656666676686696706716726736746756766776786796806816826836846856866876886896906916926936946956966976986997007017027037047057067077087097107117127137147157167177187197207217227237247257267277287297307317327337347357367377387397407417427437447457467477487497507517527537547557567577587597607617627637647657667677687697707717727737747757767777787797807817827837847857867877887897907917927937947957967977987998008018028038048058068078088098108118128138148158168178188198208218228238248258268278288298308318328338348358368378388398408418428438448458468478488498508518528538548558568578588598608618628638648658668678688698708718728738748758768778788798808818828838848858868878888898908918928938948958968978988999009019029039049059069079089099109119129139149159169179189199209219229239249259269279289299309319329339349359369379389399409419429439449459469479489499509519529539549559569579589599609619629639649659669679689699709719729739749759769779789799809819829839849859869879889899909919929939949959969979989991000100110021003100410051006100710081009101010111012101310141015101610171018101910201021102210231024102510261027102810291030103110321033103410351036103710381039104010411042104310441045104610471048104910501051105210531054105510561057105810591060106110621063106410651066106710681069107010711072107310741075107610771078107910801081108210831084108510861087108810891090109110921093109410951096109710981099110011011102110311041105110611071108110911101111111211131114111511161117111811191120112111221123112411251126112711281129113011311132113311341135113611371138113911401141114211431144114511461147114811491150115111521153115411551156115711581159116011611162116311641165116611671168116911701171117211731174117511761177117811791180118111821183118411851186118711881189119011911192119311941195119611971198119912001201120212031204120512061207120812091210121112121213121412151216121712181219122012211222122312241225122612271228122912301231123212331234123512361237123812391240124112421243124412451246124712481249125012511252125312541255125612571258125912601261126212631264126512661267126812691270127112721273127412751276127712781279128012811282128312841285128612871288128912901291129212931294129512961297129812991300130113021303130413051306130713081309131013111312131313141315131613171318131913201321132213231324132513261327132813291330133113321333133413351336133713381339134013411342134313441345134613471348134913501351135213531354135513561357135813591360136113621363136413651366136713681369137013711372137313741375137613771378137913801381138213831384138513861387138813891390139113921393139413951396139713981399140014011402140314041405140614071408140914101411141214131414141514161417141814191420142114221423142414251426142714281429143014311432143314341435143614371438143914401441144214431444144514461447144814491450145114521453
  1. *mbyte.txt* For Vim version 9.0. Last change: 2022 Apr 03
  2. VIM REFERENCE MANUAL by Bram Moolenaar et al.
  3. Multi-byte support *multibyte* *multi-byte*
  4. *Chinese* *Japanese* *Korean*
  5. This is about editing text in languages which have many characters that can
  6. not be represented using one byte (one octet). Examples are Chinese, Japanese
  7. and Korean. Unicode is also covered here.
  8. For an introduction to the most common features, see |usr_45.txt| in the user
  9. manual.
  10. For changing the language of messages and menus see |mlang.txt|.
  11. 1. Getting started |mbyte-first|
  12. 2. Locale |mbyte-locale|
  13. 3. Encoding |mbyte-encoding|
  14. 4. Using a terminal |mbyte-terminal|
  15. 5. Fonts on X11 |mbyte-fonts-X11|
  16. 6. Fonts on MS-Windows |mbyte-fonts-MSwin|
  17. 7. Input on X11 |mbyte-XIM|
  18. 8. Input on MS-Windows |mbyte-IME|
  19. 9. Input with a keymap |mbyte-keymap|
  20. 10. Input with imactivatefunc() |mbyte-func|
  21. 11. Using UTF-8 |mbyte-utf8|
  22. 12. Overview of options |mbyte-options|
  23. NOTE: This file contains UTF-8 characters. These may show up as strange
  24. characters or boxes when using another encoding.
  25. ==============================================================================
  26. 1. Getting started *mbyte-first*
  27. This is a summary of the multibyte features in Vim. If you are lucky it works
  28. as described and you can start using Vim without much trouble. If something
  29. doesn't work you will have to read the rest. Don't be surprised if it takes
  30. quite a bit of work and experimenting to make Vim use all the multibyte
  31. features. Unfortunately, every system has its own way to deal with multibyte
  32. languages and it is quite complicated.
  33. LOCALE
  34. First of all, you must make sure your current locale is set correctly. If
  35. your system has been installed to use the language, it probably works right
  36. away. If not, you can often make it work by setting the $LANG environment
  37. variable in your shell: >
  38. setenv LANG ja_JP.EUC
  39. Unfortunately, the name of the locale depends on your system. Japanese might
  40. also be called "ja_JP.EUCjp" or just "ja". To see what is currently used: >
  41. :language
  42. To change the locale inside Vim use: >
  43. :language ja_JP.EUC
  44. Vim will give an error message if this doesn't work. This is a good way to
  45. experiment and find the locale name you want to use. But it's always better
  46. to set the locale in the shell, so that it is used right from the start.
  47. See |mbyte-locale| for details.
  48. ENCODING
  49. If your locale works properly, Vim will try to set the 'encoding' option
  50. accordingly. If this doesn't work you can overrule its value: >
  51. :set encoding=utf-8
  52. See |encoding-values| for a list of acceptable values.
  53. The result is that all the text that is used inside Vim will be in this
  54. encoding. Not only the text in the buffers, but also in registers, variables,
  55. etc. This also means that changing the value of 'encoding' makes the existing
  56. text invalid! The text doesn't change, but it will be displayed wrong.
  57. You can edit files in another encoding than what 'encoding' is set to. Vim
  58. will convert the file when you read it and convert it back when you write it.
  59. See 'fileencoding', 'fileencodings' and |++enc|.
  60. DISPLAY AND FONTS
  61. If you are working in a terminal (emulator) you must make sure it accepts the
  62. same encoding as which Vim is working with. If this is not the case, you can
  63. use the 'termencoding' option to make Vim convert text automatically.
  64. For the GUI you must select fonts that work with the current 'encoding'. This
  65. is the difficult part. It depends on the system you are using, the locale and
  66. a few other things. See the chapters on fonts: |mbyte-fonts-X11| for
  67. X-Windows and |mbyte-fonts-MSwin| for MS-Windows.
  68. For GTK+ 2, you can skip most of this section. The option 'guifontset' does
  69. no longer exist. You only need to set 'guifont' and everything should "just
  70. work". If your system comes with Xft2 and fontconfig and the current font
  71. does not contain a certain glyph, a different font will be used automatically
  72. if available. The 'guifontwide' option is still supported but usually you do
  73. not need to set it. It is only necessary if the automatic font selection does
  74. not suit your needs.
  75. For X11 you can set the 'guifontset' option to a list of fonts that together
  76. cover the characters that are used. Example for Korean: >
  77. :set guifontset=k12,r12
  78. Alternatively, you can set 'guifont' and 'guifontwide'. 'guifont' is used for
  79. the single-width characters, 'guifontwide' for the double-width characters.
  80. Thus the 'guifontwide' font must be exactly twice as wide as 'guifont'.
  81. Example for UTF-8: >
  82. :set guifont=-misc-fixed-medium-r-normal-*-18-120-100-100-c-90-iso10646-1
  83. :set guifontwide=-misc-fixed-medium-r-normal-*-18-120-100-100-c-180-iso10646-1
  84. You can also set 'guifont' alone, Vim will try to find a matching
  85. 'guifontwide' for you.
  86. INPUT
  87. There are several ways to enter multibyte characters:
  88. - For X11 XIM can be used. See |XIM|.
  89. - For MS-Windows IME can be used. See |IME|.
  90. - For all systems keymaps can be used. See |mbyte-keymap|.
  91. The options 'iminsert', 'imsearch' and 'imcmdline' can be used to choose
  92. the different input methods or disable them temporarily.
  93. ==============================================================================
  94. 2. Locale *mbyte-locale*
  95. The easiest setup is when your whole system uses the locale you want to work
  96. in. But it's also possible to set the locale for one shell you are working
  97. in, or just use a certain locale inside Vim.
  98. WHAT IS A LOCALE? *locale*
  99. There are many languages in the world. And there are different cultures and
  100. environments at least as many as the number of languages. A linguistic
  101. environment corresponding to an area is called "locale". This includes
  102. information about the used language, the charset, collating order for sorting,
  103. date format, currency format and so on. For Vim only the language and charset
  104. really matter.
  105. You can only use a locale if your system has support for it. Some systems
  106. have only a few locales, especially in the USA. The language which you want
  107. to use may not be on your system. In that case you might be able to install
  108. it as an extra package. Check your system documentation for how to do that.
  109. The location in which the locales are installed varies from system to system.
  110. For example, "/usr/share/locale" or "/usr/lib/locale". See your system's
  111. setlocale() man page.
  112. Looking in these directories will show you the exact name of each locale.
  113. Mostly upper/lowercase matters, thus "ja_JP.EUC" and "ja_jp.euc" are
  114. different. Some systems have a locale.alias file, which allows translation
  115. from a short name like "nl" to the full name "nl_NL.ISO_8859-1".
  116. Note that X-windows has its own locale stuff. And unfortunately uses locale
  117. names different from what is used elsewhere. This is confusing! For Vim it
  118. matters what the setlocale() function uses, which is generally NOT the
  119. X-windows stuff. You might have to do some experiments to find out what
  120. really works.
  121. *locale-name*
  122. The (simplified) format of |locale| name is:
  123. language
  124. or language_territory
  125. or language_territory.codeset
  126. Territory means the country (or part of it), codeset means the |charset|. For
  127. example, the locale name "ja_JP.eucJP" means:
  128. ja the language is Japanese
  129. JP the country is Japan
  130. eucJP the codeset is EUC-JP
  131. But it also could be "ja", "ja_JP.EUC", "ja_JP.ujis", etc. And unfortunately,
  132. the locale name for a specific language, territory and codeset is not unified
  133. and depends on your system.
  134. Examples of locale name:
  135. charset language locale name ~
  136. GB2312 Chinese (simplified) zh_CN.EUC, zh_CN.GB2312
  137. Big5 Chinese (traditional) zh_TW.BIG5, zh_TW.Big5
  138. CNS-11643 Chinese (traditional) zh_TW
  139. EUC-JP Japanese ja, ja_JP.EUC, ja_JP.ujis, ja_JP.eucJP
  140. Shift_JIS Japanese ja_JP.SJIS, ja_JP.Shift_JIS
  141. EUC-KR Korean ko, ko_KR.EUC
  142. USING A LOCALE
  143. To start using a locale for the whole system, see the documentation of your
  144. system. Mostly you need to set it in a configuration file in "/etc".
  145. To use a locale in a shell, set the $LANG environment value. When you want to
  146. use Korean and the |locale| name is "ko", do this:
  147. sh: export LANG=ko
  148. csh: setenv LANG ko
  149. You can put this in your ~/.profile or ~/.cshrc file to always use it.
  150. To use a locale in Vim only, use the |:language| command: >
  151. :language ko
  152. Put this in your ~/.vimrc file to use it always.
  153. Or specify $LANG when starting Vim:
  154. sh: LANG=ko vim {vim-arguments}
  155. csh: env LANG=ko vim {vim-arguments}
  156. You could make a small shell script for this.
  157. ==============================================================================
  158. 3. Encoding *mbyte-encoding*
  159. Vim uses the 'encoding' option to specify how characters are identified and
  160. encoded when they are used inside Vim. This applies to all the places where
  161. text is used, including buffers (files loaded into memory), registers and
  162. variables.
  163. *charset* *codeset*
  164. Charset is another name for encoding. There are subtle differences, but these
  165. don't matter when using Vim. "codeset" is another similar name.
  166. Each character is encoded as one or more bytes. When all characters are
  167. encoded with one byte, we call this a single-byte encoding. The most often
  168. used one is called "latin1". This limits the number of characters to 256.
  169. Some of these are control characters, thus even fewer can be used for text.
  170. When some characters use two or more bytes, we call this a multibyte
  171. encoding. This allows using much more than 256 characters, which is required
  172. for most East Asian languages.
  173. Most multibyte encodings use one byte for the first 127 characters. These
  174. are equal to ASCII, which makes it easy to exchange plain-ASCII text, no
  175. matter what language is used. Thus you might see the right text even when the
  176. encoding was set wrong.
  177. *encoding-names*
  178. Vim can use many different character encodings. There are three major groups:
  179. 1 8bit Single-byte encodings, 256 different characters. Mostly used
  180. in USA and Europe. Example: ISO-8859-1 (Latin1). All
  181. characters occupy one screen cell only.
  182. 2 2byte Double-byte encodings, over 10000 different characters.
  183. Mostly used in Asian countries. Example: euc-kr (Korean)
  184. The number of screen cells is equal to the number of bytes
  185. (except for euc-jp when the first byte is 0x8e).
  186. u Unicode Universal encoding, can replace all others. ISO 10646.
  187. Millions of different characters. Example: UTF-8. The
  188. relation between bytes and screen cells is complex.
  189. Other encodings cannot be used by Vim internally. But files in other
  190. encodings can be edited by using conversion, see 'fileencoding'.
  191. Note that all encodings must use ASCII for the characters up to 128 (except
  192. when compiled for EBCDIC).
  193. Supported 'encoding' values are: *encoding-values*
  194. 1 latin1 8-bit characters (ISO 8859-1, also used for cp1252)
  195. 1 iso-8859-n ISO_8859 variant (n = 2 to 15)
  196. 1 koi8-r Russian
  197. 1 koi8-u Ukrainian
  198. 1 macroman MacRoman (Macintosh encoding)
  199. 1 8bit-{name} any 8-bit encoding (Vim specific name)
  200. 1 cp437 similar to iso-8859-1
  201. 1 cp737 similar to iso-8859-7
  202. 1 cp775 Baltic
  203. 1 cp850 similar to iso-8859-4
  204. 1 cp852 similar to iso-8859-1
  205. 1 cp855 similar to iso-8859-2
  206. 1 cp857 similar to iso-8859-5
  207. 1 cp860 similar to iso-8859-9
  208. 1 cp861 similar to iso-8859-1
  209. 1 cp862 similar to iso-8859-1
  210. 1 cp863 similar to iso-8859-8
  211. 1 cp865 similar to iso-8859-1
  212. 1 cp866 similar to iso-8859-5
  213. 1 cp869 similar to iso-8859-7
  214. 1 cp874 Thai
  215. 1 cp1250 Czech, Polish, etc.
  216. 1 cp1251 Cyrillic
  217. 1 cp1253 Greek
  218. 1 cp1254 Turkish
  219. 1 cp1255 Hebrew
  220. 1 cp1256 Arabic
  221. 1 cp1257 Baltic
  222. 1 cp1258 Vietnamese
  223. 1 cp{number} MS-Windows: any installed single-byte codepage
  224. 2 cp932 Japanese (Windows only)
  225. 2 euc-jp Japanese (Unix only)
  226. 2 sjis Japanese (Unix only)
  227. 2 cp949 Korean (Unix and Windows)
  228. 2 euc-kr Korean (Unix only)
  229. 2 cp936 simplified Chinese (Windows only)
  230. 2 euc-cn simplified Chinese (Unix only)
  231. 2 cp950 traditional Chinese (on Unix alias for big5)
  232. 2 big5 traditional Chinese (on Windows alias for cp950)
  233. 2 euc-tw traditional Chinese (Unix only)
  234. 2 2byte-{name} Unix: any double-byte encoding (Vim specific name)
  235. 2 cp{number} MS-Windows: any installed double-byte codepage
  236. u utf-8 32 bit UTF-8 encoded Unicode (ISO/IEC 10646-1)
  237. u ucs-2 16 bit UCS-2 encoded Unicode (ISO/IEC 10646-1)
  238. u ucs-2le like ucs-2, little endian
  239. u utf-16 ucs-2 extended with double-words for more characters
  240. u utf-16le like utf-16, little endian
  241. u ucs-4 32 bit UCS-4 encoded Unicode (ISO/IEC 10646-1)
  242. u ucs-4le like ucs-4, little endian
  243. The {name} can be any encoding name that your system supports. It is passed
  244. to iconv() to convert between the encoding of the file and the current locale.
  245. For MS-Windows "cp{number}" means using codepage {number}.
  246. Examples: >
  247. :set encoding=8bit-cp1252
  248. :set encoding=2byte-cp932
  249. The MS-Windows codepage 1252 is very similar to latin1. For practical reasons
  250. the same encoding is used and it's called latin1. 'isprint' can be used to
  251. display the characters 0x80 - 0xA0 or not.
  252. Several aliases can be used, they are translated to one of the names above.
  253. An incomplete list:
  254. 1 ansi same as latin1 (obsolete, for backward compatibility)
  255. 2 japan Japanese: on Unix "euc-jp", on MS-Windows cp932
  256. 2 korea Korean: on Unix "euc-kr", on MS-Windows cp949
  257. 2 prc simplified Chinese: on Unix "euc-cn", on MS-Windows cp936
  258. 2 chinese same as "prc"
  259. 2 taiwan traditional Chinese: on Unix "euc-tw", on MS-Windows cp950
  260. u utf8 same as utf-8
  261. u unicode same as ucs-2
  262. u ucs2be same as ucs-2 (big endian)
  263. u ucs-2be same as ucs-2 (big endian)
  264. u ucs-4be same as ucs-4 (big endian)
  265. u utf-32 same as ucs-4
  266. u utf-32le same as ucs-4le
  267. default stands for the default value of 'encoding', depends on the
  268. environment
  269. For the UCS codes the byte order matters. This is tricky, use UTF-8 whenever
  270. you can. The default is to use big-endian (most significant byte comes
  271. first):
  272. name bytes char ~
  273. ucs-2 11 22 1122
  274. ucs-2le 22 11 1122
  275. ucs-4 11 22 33 44 11223344
  276. ucs-4le 44 33 22 11 11223344
  277. On MS-Windows systems you often want to use "ucs-2le", because it uses little
  278. endian UCS-2.
  279. There are a few encodings which are similar, but not exactly the same. Vim
  280. treats them as if they were different encodings, so that conversion will be
  281. done when needed. You might want to use the similar name to avoid conversion
  282. or when conversion is not possible:
  283. cp932, shift-jis, sjis
  284. cp936, euc-cn
  285. *encoding-table*
  286. Normally 'encoding' is equal to your current locale and 'termencoding' is
  287. empty. This means that your keyboard and display work with characters encoded
  288. in your current locale, and Vim uses the same characters internally.
  289. You can make Vim use characters in a different encoding by setting the
  290. 'encoding' option to a different value. Since the keyboard and display still
  291. use the current locale, conversion needs to be done. The 'termencoding' then
  292. takes over the value of the current locale, so Vim converts between 'encoding'
  293. and 'termencoding'. Example: >
  294. :let &termencoding = &encoding
  295. :set encoding=utf-8
  296. However, not all combinations of values are possible. The table below tells
  297. you how each of the nine combinations works. This is further restricted by
  298. not all conversions being possible, iconv() being present, etc. Since this
  299. depends on the system used, no detailed list can be given.
  300. ('tenc' is the short name for 'termencoding' and 'enc' short for 'encoding')
  301. 'tenc' 'enc' remark ~
  302. 8bit 8bit Works. When 'termencoding' is different from
  303. 'encoding' typing and displaying may be wrong for some
  304. characters, Vim does NOT perform conversion (set
  305. 'encoding' to "utf-8" to get this).
  306. 8bit 2byte MS-Windows: works for all codepages installed on your
  307. system; you can only type 8bit characters;
  308. Other systems: does NOT work.
  309. 8bit Unicode Works, but only 8bit characters can be typed directly
  310. (others through digraphs, keymaps, etc.); in a
  311. terminal you can only see 8bit characters; the GUI can
  312. show all characters that the 'guifont' supports.
  313. 2byte 8bit Works, but typing non-ASCII characters might
  314. be a problem.
  315. 2byte 2byte MS-Windows: works for all codepages installed on your
  316. system; typing characters might be a problem when
  317. locale is different from 'encoding'.
  318. Other systems: Only works when 'termencoding' is equal
  319. to 'encoding', you might as well leave it empty.
  320. 2byte Unicode works, Vim will translate typed characters.
  321. Unicode 8bit works (unusual)
  322. Unicode 2byte does NOT work
  323. Unicode Unicode works very well (leaving 'termencoding' empty works
  324. the same way, because all Unicode is handled
  325. internally as UTF-8)
  326. CONVERSION *charset-conversion*
  327. Vim will automatically convert from one to another encoding in several places:
  328. - When reading a file and 'fileencoding' is different from 'encoding'
  329. - When writing a file and 'fileencoding' is different from 'encoding'
  330. - When displaying characters and 'termencoding' is different from 'encoding'
  331. - When reading input and 'termencoding' is different from 'encoding'
  332. - When displaying messages and the encoding used for LC_MESSAGES differs from
  333. 'encoding' (requires a gettext version that supports this).
  334. - When reading a Vim script where |:scriptencoding| is different from
  335. 'encoding'.
  336. - When reading or writing a |viminfo| file.
  337. Most of these require the |+iconv| feature. Conversion for reading and
  338. writing files may also be specified with the 'charconvert' option.
  339. Useful utilities for converting the charset:
  340. All: iconv
  341. GNU iconv can convert most encodings. Unicode is used as the
  342. intermediate encoding, which allows conversion from and to all other
  343. encodings. See http://www.gnu.org/directory/libiconv.html.
  344. Japanese: nkf
  345. Nkf is "Network Kanji code conversion Filter". One of the most unique
  346. facility of nkf is the guess of the input Kanji code. So, you don't
  347. need to know what the inputting file's |charset| is. When convert to
  348. EUC-JP from ISO-2022-JP or Shift_JIS, simply do the following command
  349. in Vim:
  350. :%!nkf -e
  351. Nkf can be found at:
  352. http://www.sfc.wide.ad.jp/~max/FreeBSD/ports/distfiles/nkf-1.62.tar.gz
  353. Chinese: hc
  354. Hc is "Hanzi Converter". Hc convert a GB file to a Big5 file, or Big5
  355. file to GB file. Hc can be found at:
  356. ftp://ftp.cuhk.hk/pub/chinese/ifcss/software/unix/convert/hc-30.tar.gz
  357. Korean: hmconv
  358. Hmconv is Korean code conversion utility especially for E-mail. It can
  359. convert between EUC-KR and ISO-2022-KR. Hmconv can be found at:
  360. ftp://ftp.kaist.ac.kr/pub/hangul/code/hmconv/
  361. Multilingual: lv
  362. Lv is a Powerful Multilingual File Viewer. And it can be worked as
  363. |charset| converter. Supported |charset|: ISO-2022-CN, ISO-2022-JP,
  364. ISO-2022-KR, EUC-CN, EUC-JP, EUC-KR, EUC-TW, UTF-7, UTF-8, ISO-8859
  365. series, Shift_JIS, Big5 and HZ. Lv can be found at:
  366. http://www.ff.iij4u.or.jp/~nrt/lv/index.html
  367. *mbyte-conversion*
  368. When reading and writing files in an encoding different from 'encoding',
  369. conversion needs to be done. These conversions are supported:
  370. - All conversions between Latin-1 (ISO-8859-1), UTF-8, UCS-2 and UCS-4 are
  371. handled internally.
  372. - For MS-Windows, when 'encoding' is a Unicode encoding, conversion from and
  373. to any codepage should work.
  374. - Conversion specified with 'charconvert'
  375. - Conversion with the iconv library, if it is available.
  376. Old versions of GNU iconv() may cause the conversion to fail (they
  377. request a very large buffer, more than Vim is willing to provide).
  378. Try getting another iconv() implementation.
  379. *iconv-dynamic*
  380. On MS-Windows Vim can be compiled with the |+iconv/dyn| feature. This means
  381. Vim will search for the "iconv.dll" and "libiconv.dll" libraries. When
  382. neither of them can be found Vim will still work but some conversions won't be
  383. possible.
  384. ==============================================================================
  385. 4. Using a terminal *mbyte-terminal*
  386. The GUI fully supports multibyte characters. It is also possible in a
  387. terminal, if the terminal supports the same encoding that Vim uses. Thus this
  388. is less flexible.
  389. For example, you can run Vim in a xterm with added multibyte support and/or
  390. |XIM|. Examples are kterm (Kanji term) and hanterm (for Korean), Eterm
  391. (Enlightened terminal) and rxvt.
  392. If your terminal does not support the right encoding, you can set the
  393. 'termencoding' option. Vim will then convert the typed characters from
  394. 'termencoding' to 'encoding'. And displayed text will be converted from
  395. 'encoding' to 'termencoding'. If the encoding supported by the terminal
  396. doesn't include all the characters that Vim uses, this leads to lost
  397. characters. This may mess up the display. If you use a terminal that
  398. supports Unicode, such as the xterm mentioned below, it should work just fine,
  399. since nearly every character set can be converted to Unicode without loss of
  400. information.
  401. UTF-8 IN XFREE86 XTERM *UTF8-xterm*
  402. This is a short explanation of how to use UTF-8 character encoding in the
  403. xterm that comes with XFree86 by Thomas Dickey (text by Markus Kuhn).
  404. Get the latest xterm version which has now UTF-8 support:
  405. http://invisible-island.net/xterm/xterm.html
  406. Compile it with "./configure --enable-wide-chars ; make"
  407. Also get the ISO 10646-1 version of various fonts, which is available on
  408. http://www.cl.cam.ac.uk/~mgk25/download/ucs-fonts.tar.gz
  409. and install the font as described in the README file.
  410. Now start xterm with >
  411. xterm -u8 -fn -misc-fixed-medium-r-semicondensed--13-120-75-75-c-60-iso10646-1
  412. or, for bigger character: >
  413. xterm -u8 -fn -misc-fixed-medium-r-normal--15-140-75-75-c-90-iso10646-1
  414. and you will have a working UTF-8 terminal emulator. Try both >
  415. cat utf-8-demo.txt
  416. vim utf-8-demo.txt
  417. with the demo text that comes with ucs-fonts.tar.gz in order to see
  418. whether there are any problems with UTF-8 in your xterm.
  419. For Vim you may need to set 'encoding' to "utf-8".
  420. ==============================================================================
  421. 5. Fonts on X11 *mbyte-fonts-X11*
  422. Unfortunately, using fonts in X11 is complicated. The name of a single-byte
  423. font is a long string. For multibyte fonts we need several of these...
  424. Note: Most of this is no longer relevant for GTK+ 2. Selecting a font via
  425. its XLFD is not supported; see 'guifont' for an example of how to
  426. set the font. Do yourself a favor and ignore the |XLFD| and |xfontset|
  427. sections below.
  428. First of all, Vim only accepts fixed-width fonts for displaying text. You
  429. cannot use proportionally spaced fonts. This excludes many of the available
  430. (and nicer looking) fonts. However, for menus and tooltips any font can be
  431. used.
  432. Note that Display and Input are independent. It is possible to see your
  433. language even though you have no input method for it.
  434. You should get a default font for menus and tooltips that works, but it might
  435. be ugly. Read the following to find out how to select a better font.
  436. X LOGICAL FONT DESCRIPTION (XLFD)
  437. *XLFD*
  438. XLFD is the X font name and contains the information about the font size,
  439. charset, etc. The name is in this format:
  440. FOUNDRY-FAMILY-WEIGHT-SLANT-WIDTH-STYLE-PIXEL-POINT-X-Y-SPACE-AVE-CR-CE
  441. Each field means:
  442. - FOUNDRY: FOUNDRY field. The company that created the font.
  443. - FAMILY: FAMILY_NAME field. Basic font family name. (helvetica, gothic,
  444. times, etc)
  445. - WEIGHT: WEIGHT_NAME field. How thick the letters are. (light, medium,
  446. bold, etc)
  447. - SLANT: SLANT field.
  448. r: Roman (no slant)
  449. i: Italic
  450. o: Oblique
  451. ri: Reverse Italic
  452. ro: Reverse Oblique
  453. ot: Other
  454. number: Scaled font
  455. - WIDTH: SETWIDTH_NAME field. Width of characters. (normal, condensed,
  456. narrow, double wide)
  457. - STYLE: ADD_STYLE_NAME field. Extra info to describe font. (Serif, Sans
  458. Serif, Informal, Decorated, etc)
  459. - PIXEL: PIXEL_SIZE field. Height, in pixels, of characters.
  460. - POINT: POINT_SIZE field. Ten times height of characters in points.
  461. - X: RESOLUTION_X field. X resolution (dots per inch).
  462. - Y: RESOLUTION_Y field. Y resolution (dots per inch).
  463. - SPACE: SPACING field.
  464. p: Proportional
  465. m: Monospaced
  466. c: CharCell
  467. - AVE: AVERAGE_WIDTH field. Ten times average width in pixels.
  468. - CR: CHARSET_REGISTRY field. The name of the charset group.
  469. - CE: CHARSET_ENCODING field. The rest of the charset name. For some
  470. charsets, such as JIS X 0208, if this field is 0, code points has
  471. the same value as GL, and GR if 1.
  472. For example, in case of a 16 dots font corresponding to JIS X 0208, it is
  473. written like:
  474. -misc-fixed-medium-r-normal--16-110-100-100-c-160-jisx0208.1990-0
  475. X FONTSET
  476. *fontset* *xfontset*
  477. A single-byte charset is typically associated with one font. For multibyte
  478. charsets a combination of fonts is often used. This means that one group of
  479. characters are used from one font and another group from another font (which
  480. might be double wide). This collection of fonts is called a fontset.
  481. Which fonts are required in a fontset depends on the current locale. X
  482. windows maintains a table of which groups of characters are required for a
  483. locale. You have to specify all the fonts that a locale requires in the
  484. 'guifontset' option.
  485. Setting the 'guifontset' option also means that all font names will be handled
  486. as a fontset name. Also the ones used for the "font" argument of the
  487. |:highlight| command.
  488. Note the difference between 'guifont' and 'guifontset': In 'guifont'
  489. the comma-separated names are alternative names, one of which will be
  490. used. In 'guifontset' the whole string is one fontset name,
  491. including the commas. It is not possible to specify alternative
  492. fontset names.
  493. This example works on many X11 systems: >
  494. :set guifontset=-*-*-medium-r-normal--16-*-*-*-c-*-*-*
  495. <
  496. The fonts must match with the current locale. If fonts for the character sets
  497. that the current locale uses are not included, setting 'guifontset' will fail.
  498. NOTE: The fontset always uses the current locale, even though 'encoding' may
  499. be set to use a different charset. In that situation you might want to use
  500. 'guifont' and 'guifontwide' instead of 'guifontset'.
  501. Example:
  502. |charset| language "groups of characters" ~
  503. GB2312 Chinese (simplified) ISO-8859-1 and GB 2312
  504. Big5 Chinese (traditional) ISO-8859-1 and Big5
  505. CNS-11643 Chinese (traditional) ISO-8859-1, CNS 11643-1 and CNS 11643-2
  506. EUC-JP Japanese JIS X 0201 and JIS X 0208
  507. EUC-KR Korean ISO-8859-1 and KS C 5601 (KS X 1001)
  508. You can search for fonts using the xlsfonts command. For example, when you're
  509. searching for a font for KS C 5601: >
  510. xlsfonts | grep ksc5601
  511. This is complicated and confusing. You might want to consult the X-Windows
  512. documentation if there is something you don't understand.
  513. *base_font_name_list*
  514. When you have found the names of the fonts you want to use, you need to set
  515. the 'guifontset' option. You specify the list by concatenating the font names
  516. and putting a comma in between them.
  517. For example, when you use the ja_JP.eucJP locale, this requires JIS X 0201
  518. and JIS X 0208. You could supply a list of fonts that explicitly specifies
  519. the charsets, like: >
  520. :set guifontset=-misc-fixed-medium-r-normal--14-130-75-75-c-140-jisx0208.1983-0,
  521. \-misc-fixed-medium-r-normal--14-130-75-75-c-70-jisx0201.1976-0
  522. Alternatively, you can supply a base font name list that omits the charset
  523. name, letting X-Windows select font characters required for the locale. For
  524. example: >
  525. :set guifontset=-misc-fixed-medium-r-normal--14-130-75-75-c-140,
  526. \-misc-fixed-medium-r-normal--14-130-75-75-c-70
  527. Alternatively, you can supply a single base font name that allows X-Windows to
  528. select from all available fonts. For example: >
  529. :set guifontset=-misc-fixed-medium-r-normal--14-*
  530. Alternatively, you can specify alias names. See the fonts.alias file in the
  531. fonts directory (e.g., /usr/X11R6/lib/X11/fonts/). For example: >
  532. :set guifontset=k14,r14
  533. <
  534. *E253*
  535. Note that in East Asian fonts, the standard character cell is square. When
  536. mixing a Latin font and an East Asian font, the East Asian font width should
  537. be twice the Latin font width.
  538. If 'guifontset' is not empty, the "font" argument of the |:highlight| command
  539. is also interpreted as a fontset. For example, you should use for
  540. highlighting: >
  541. :hi Comment font=english_font,your_font
  542. If you use a wrong "font" argument you will get an error message.
  543. Also make sure that you set 'guifontset' before setting fonts for highlight
  544. groups.
  545. USING RESOURCE FILES
  546. Instead of specifying 'guifontset', you can set X11 resources and Vim will
  547. pick them up. This is only for people who know how X resource files work.
  548. For Motif insert these three lines in your $HOME/.Xdefaults file:
  549. Vim.font: |base_font_name_list|
  550. Vim*fontSet: |base_font_name_list|
  551. Vim*fontList: your_language_font
  552. Note: Vim.font is for text area.
  553. Vim*fontSet is for menu.
  554. Vim*fontList is for menu (for Motif GUI)
  555. For example, when you are using Japanese and a 14 dots font, >
  556. Vim.font: -misc-fixed-medium-r-normal--14-*
  557. Vim*fontSet: -misc-fixed-medium-r-normal--14-*
  558. Vim*fontList: -misc-fixed-medium-r-normal--14-*
  559. <
  560. or: >
  561. Vim*font: k14,r14
  562. Vim*fontSet: k14,r14
  563. Vim*fontList: k14,r14
  564. <
  565. To have them take effect immediately you will have to do >
  566. xrdb -merge ~/.Xdefaults
  567. Otherwise you will have to stop and restart the X server before the changes
  568. take effect.
  569. The GTK+ version of GUI Vim does not use .Xdefaults, use ~/.gtkrc instead.
  570. The default mostly works OK. But for the menus you might have to change
  571. it. Example: >
  572. style "default"
  573. {
  574. fontset="-*-*-medium-r-normal--14-*-*-*-c-*-*-*"
  575. }
  576. widget_class "*" style "default"
  577. ==============================================================================
  578. 6. Fonts on MS-Windows *mbyte-fonts-MSwin*
  579. The simplest is to use the font dialog to select fonts and try them out. You
  580. can find this at the "Edit/Select Font..." menu. Once you find a font name
  581. that works well you can use this command to see its name: >
  582. :set guifont
  583. Then add a command to your |gvimrc| file to set 'guifont': >
  584. :set guifont=courier_new:h12
  585. ==============================================================================
  586. 7. Input on X11 *mbyte-XIM*
  587. X INPUT METHOD (XIM) BACKGROUND *XIM* *xim* *x-input-method*
  588. XIM is an international input module for X. There are two kinds of structures,
  589. Xlib unit type and |IM-server| (Input-Method server) type. |IM-server| type
  590. is suitable for complex input, such as CJK.
  591. - IM-server
  592. *IM-server*
  593. In |IM-server| type input structures, the input event is handled by either
  594. of the two ways: FrontEnd system and BackEnd system. In the FrontEnd
  595. system, input events are snatched by the |IM-server| first, then |IM-server|
  596. give the application the result of input. On the other hand, the BackEnd
  597. system works reverse order. MS-Windows adopt BackEnd system. In X, most of
  598. |IM-server|s adopt FrontEnd system. The demerit of BackEnd system is the
  599. large overhead in communication, but it provides safe synchronization with
  600. no restrictions on applications.
  601. For example, there are xwnmo and kinput2 Japanese |IM-server|, both are
  602. FrontEnd system. Xwnmo is distributed with Wnn (see below), kinput2 can be
  603. found at: ftp://ftp.sra.co.jp/pub/x11/kinput2/
  604. For Chinese, there's a great XIM server named "xcin", you can input both
  605. Traditional and Simplified Chinese characters. And it can accept other
  606. locale if you make a correct input table. Xcin can be found at:
  607. http://cle.linux.org.tw/xcin/
  608. Others are scim: http://scim.freedesktop.org/ and fcitx:
  609. http://www.fcitx.org/
  610. - Conversion Server
  611. *conversion-server*
  612. Some system needs additional server: conversion server. Most of Japanese
  613. |IM-server|s need it, Kana-Kanji conversion server. For Chinese inputting,
  614. it depends on the method of inputting, in some methods, PinYin or ZhuYin to
  615. HanZi conversion server is needed. For Korean inputting, if you want to
  616. input Hanja, Hangul-Hanja conversion server is needed.
  617. For example, the Japanese inputting process is divided into 2 steps. First
  618. we pre-input Hira-gana, second Kana-Kanji conversion. There are so many
  619. Kanji characters (6349 Kanji characters are defined in JIS X 0208) and the
  620. number of Hira-gana characters are 76. So, first, we pre-input text as
  621. pronounced in Hira-gana, second, we convert Hira-gana to Kanji or Kata-Kana,
  622. if needed. There are some Kana-Kanji conversion server: jserver
  623. (distributed with Wnn, see below) and canna. Canna can be found at:
  624. http://canna.sourceforge.jp/
  625. There is a good input system: Wnn4.2. Wnn 4.2 contains,
  626. xwnmo (|IM-server|)
  627. jserver (Japanese Kana-Kanji conversion server)
  628. cserver (Chinese PinYin or ZhuYin to simplified HanZi conversion server)
  629. tserver (Chinese PinYin or ZhuYin to traditional HanZi conversion server)
  630. kserver (Hangul-Hanja conversion server)
  631. Wnn 4.2 for several systems can be found at various places on the internet.
  632. Use the RPM or port for your system.
  633. - Input Style
  634. *xim-input-style*
  635. When inputting CJK, there are four areas:
  636. 1. The area to display of the input while it is being composed
  637. 2. The area to display the currently active input mode.
  638. 3. The area to display the next candidate for the selection.
  639. 4. The area to display other tools.
  640. The third area is needed when converting. For example, in Japanese
  641. inputting, multiple Kanji characters could have the same pronunciation, so
  642. a sequence of Hira-gana characters could map to a distinct sequence of Kanji
  643. characters.
  644. The first and second areas are defined in international input of X with the
  645. names of "Preedit Area", "Status Area" respectively. The third and fourth
  646. areas are not defined and are left to be managed by the |IM-server|. In the
  647. international input, four input styles have been defined using combinations
  648. of Preedit Area and Status Area: |OnTheSpot|, |OffTheSpot|, |OverTheSpot|
  649. and |Root|.
  650. Currently, GUI Vim supports three styles, |OverTheSpot|, |OffTheSpot| and
  651. |Root|.
  652. When compiled with |+GUI_GTK| feature, GUI Vim supports two styles,
  653. |OnTheSpot| and |OverTheSpot|. You can select the style with the 'imstyle'
  654. option.
  655. *. on-the-spot *OnTheSpot*
  656. Preedit Area and Status Area are performed by the client application in
  657. the area of application. The client application is directed by the
  658. |IM-server| to display all pre-edit data at the location of text
  659. insertion. The client registers callbacks invoked by the input method
  660. during pre-editing.
  661. *. over-the-spot *OverTheSpot*
  662. Status Area is created in a fixed position within the area of application,
  663. in case of Vim, the position is the additional status line. Preedit Area
  664. is made at present input position of application. The input method
  665. displays pre-edit data in a window which it brings up directly over the
  666. text insertion position.
  667. *. off-the-spot *OffTheSpot*
  668. Preedit Area and Status Area are performed in the area of application, in
  669. case of Vim, the area is additional status line. The client application
  670. provides display windows for the pre-edit data to the input method which
  671. displays into them directly.
  672. *. root-window *Root*
  673. Preedit Area and Status Area are outside of the application. The input
  674. method displays all pre-edit data in a separate area of the screen in a
  675. window specific to the input method.
  676. USING XIM *multibyte-input* *E284* *E285* *E286* *E287*
  677. *E288* *E289*
  678. Note that Display and Input are independent. It is possible to see your
  679. language even though you have no input method for it. But when your Display
  680. method doesn't match your Input method, the text will be displayed wrong.
  681. Note: You can not use IM unless you specify 'guifontset'.
  682. Therefore, Latin users, you have to also use 'guifontset'
  683. if you use IM.
  684. To input your language you should run the |IM-server| which supports your
  685. language and |conversion-server| if needed.
  686. The next 3 lines should be put in your ~/.Xdefaults file. They are common for
  687. all X applications which uses |XIM|. If you already use |XIM|, you can skip
  688. this. >
  689. *international: True
  690. *.inputMethod: your_input_server_name
  691. *.preeditType: your_input_style
  692. <
  693. input_server_name is your |IM-server| name (check your |IM-server|
  694. manual).
  695. your_input_style is one of |OverTheSpot|, |OffTheSpot|, |Root|. See
  696. also |xim-input-style|.
  697. *international may not be necessary if you use X11R6.
  698. *.inputMethod and *.preeditType are optional if you use X11R6.
  699. For example, when you are using kinput2 as |IM-server|, >
  700. *international: True
  701. *.inputMethod: kinput2
  702. *.preeditType: OverTheSpot
  703. <
  704. When using |OverTheSpot|, GUI Vim always connects to the IM Server even in
  705. Normal mode, so you can input your language with commands like "f" and "r".
  706. But when using one of the other two methods, GUI Vim connects to the IM Server
  707. only if it is not in Normal mode.
  708. If your IM Server does not support |OverTheSpot|, and if you want to use your
  709. language with some Normal mode command like "f" or "r", then you should use a
  710. localized xterm or an xterm which supports |XIM|
  711. If needed, you can set the XMODIFIERS environment variable:
  712. sh: export XMODIFIERS="@im=input_server_name"
  713. csh: setenv XMODIFIERS "@im=input_server_name"
  714. For example, when you are using kinput2 as |IM-server| and sh, >
  715. export XMODIFIERS="@im=kinput2"
  716. <
  717. FULLY CONTROLLED XIM
  718. You can fully control XIM, like with IME of MS-Windows (see |multibyte-ime|).
  719. This is currently only available for the GTK GUI.
  720. Before using fully controlled XIM, one setting is required. Set the
  721. 'imactivatekey' option to the key that is used for the activation of the input
  722. method. For example, when you are using kinput2 + canna as IM Server, the
  723. activation key is probably Shift+Space: >
  724. :set imactivatekey=S-space
  725. See 'imactivatekey' for the format.
  726. ==============================================================================
  727. 8. Input on MS-Windows *mbyte-IME*
  728. (Windows IME support) *multibyte-ime* *IME*
  729. {only works Windows GUI and compiled with the |+multi_byte_ime| feature}
  730. To input multibyte characters on Windows, you can use an Input Method Editor
  731. (IME). In process of your editing text, you must switch status (on/off) of
  732. IME many many many times. Because IME with status on is hooking all of your
  733. key inputs, you cannot input 'j', 'k', or almost all of keys to Vim directly.
  734. The |+multi_byte_ime| feature helps for this. It reduces the number of times
  735. the IME status has to be switched manually. In Normal mode, there is almost
  736. no need to use IME, even when editing multibyte text. So when exiting Insert
  737. mode, Vim memorizes the last status of IME and turns off IME. When
  738. re-entering Insert mode, Vim sets the IME status to that memorized status
  739. automatically.
  740. This works on not only insert-normal mode, but also search-command input and
  741. replace mode.
  742. The options 'iminsert', 'imsearch' and 'imcmdline' can be used to choose
  743. the different input methods or disable them temporarily.
  744. On Windows 9x and Windows NT 4.0 there was *global-ime* , but this is no
  745. longer supported. You can still find documentation for Active Input Method
  746. Manager (Global IME) here:
  747. http://msdn.microsoft.com/en-us/library/aa741221(v=VS.85).aspx
  748. NOTE: For IME to work you must make sure the input locales of your language
  749. are added to your system. The exact location of this depends on the version
  750. of Windows you use. For example, on my Windows 2000 box:
  751. 1. Control Panel
  752. 2. Regional Options
  753. 3. Input Locales Tab
  754. 4. Add Installed input locales -> Chinese(PRC)
  755. The default is still English (United Stated)
  756. Cursor color when IME or XIM is on *CursorIM*
  757. There is a little cute feature for IME. Cursor can indicate status of IME
  758. by changing its color. Usually status of IME was indicated by little icon
  759. at a corner of desktop (or taskbar). It is not easy to verify status of
  760. IME. But this feature help this.
  761. This works in the same way when using XIM.
  762. You can select cursor color when status is on by using highlight group
  763. CursorIM. For example, add these lines to your |gvimrc|: >
  764. if has('multi_byte_ime')
  765. highlight Cursor guifg=NONE guibg=Green
  766. highlight CursorIM guifg=NONE guibg=Purple
  767. endif
  768. <
  769. Cursor color with off IME is green. And purple cursor indicates that
  770. status is on.
  771. ==============================================================================
  772. 9. Input with a keymap *mbyte-keymap*
  773. When the keyboard doesn't produce the characters you want to enter in your
  774. text, you can use the 'keymap' option. This will translate one or more
  775. (English) characters to another (non-English) character. This only happens
  776. when typing text, not when typing Vim commands. This avoids having to switch
  777. between two keyboard settings.
  778. {only available when compiled with the |+keymap| feature}
  779. The value of the 'keymap' option specifies a keymap file to use. The name of
  780. this file is one of these two:
  781. keymap/{keymap}_{encoding}.vim
  782. keymap/{keymap}.vim
  783. Here {keymap} is the value of the 'keymap' option and {encoding} of the
  784. 'encoding' option. The file name with the {encoding} included is tried first.
  785. 'runtimepath' is used to find these files. To see an overview of all
  786. available keymap files, use this: >
  787. :echo globpath(&rtp, "keymap/*.vim")
  788. In Insert and Command-line mode you can use CTRL-^ to toggle between using the
  789. keyboard map or not. |i_CTRL-^| |c_CTRL-^|
  790. This flag is remembered for Insert mode with the 'iminsert' option. When
  791. leaving and entering Insert mode the previous value is used. The same value
  792. is also used for commands that take a single character argument, like |f| and
  793. |r|.
  794. For Command-line mode the flag is NOT remembered. You are expected to type an
  795. Ex command first, which is ASCII.
  796. For typing search patterns the 'imsearch' option is used. It can be set to
  797. use the same value as for 'iminsert'.
  798. *lCursor*
  799. It is possible to give the GUI cursor another color when the language mappings
  800. are being used. This is disabled by default, to avoid that the cursor becomes
  801. invisible when you use a non-standard background color. Here is an example to
  802. use a brightly colored cursor: >
  803. :highlight Cursor guifg=NONE guibg=Green
  804. :highlight lCursor guifg=NONE guibg=Cyan
  805. <
  806. *keymap-file-format* *:loadk* *:loadkeymap* *E105* *E791*
  807. The keymap file looks something like this: >
  808. " Maintainer: name <email@address>
  809. " Last Changed: 2001 Jan 1
  810. let b:keymap_name = "short"
  811. loadkeymap
  812. a A
  813. b B comment
  814. The lines starting with a " are comments and will be ignored. Blank lines are
  815. also ignored. The lines with the mappings may have a comment after the useful
  816. text.
  817. The "b:keymap_name" can be set to a short name, which will be shown in the
  818. status line. The idea is that this takes less room than the value of
  819. 'keymap', which might be long to distinguish between different languages,
  820. keyboards and encodings.
  821. The actual mappings are in the lines below "loadkeymap". In the example "a"
  822. is mapped to "A" and "b" to "B". Thus the first item is mapped to the second
  823. item. This is done for each line, until the end of the file.
  824. These items are exactly the same as what can be used in a |:lnoremap| command,
  825. using "<buffer>" to make the mappings local to the buffer.
  826. You can check the result with this command: >
  827. :lmap
  828. The two items must be separated by white space. You cannot include white
  829. space inside an item, use the special names "<Tab>" and "<Space>" instead.
  830. The length of the two items together must not exceed 200 bytes.
  831. It's possible to have more than one character in the first column. This works
  832. like a dead key. Example: >
  833. 'a á
  834. Since Vim doesn't know if the next character after a quote is really an "a",
  835. it will wait for the next character. To be able to insert a single quote,
  836. also add this line: >
  837. '' '
  838. Since the mapping is defined with |:lnoremap| the resulting quote will not be
  839. used for the start of another character.
  840. The "accents" keymap uses this. *keymap-accents*
  841. The first column can also be in |<>| form:
  842. <C-c> Ctrl-C
  843. <A-c> Alt-c
  844. <A-C> Alt-C
  845. Note that the Alt mappings may not work, depending on your keyboard and
  846. terminal.
  847. Although it's possible to have more than one character in the second column,
  848. this is unusual. But you can use various ways to specify the character: >
  849. A a literal character
  850. A <char-97> decimal value
  851. A <char-0x61> hexadecimal value
  852. A <char-0141> octal value
  853. x <Space> special key name
  854. The characters are assumed to be encoded for the current value of 'encoding'.
  855. It's possible to use ":scriptencoding" when all characters are given
  856. literally. That doesn't work when using the <char-> construct, because the
  857. conversion is done on the keymap file, not on the resulting character.
  858. The lines after "loadkeymap" are interpreted with 'cpoptions' set to "C".
  859. This means that continuation lines are not used and a backslash has a special
  860. meaning in the mappings. Examples: >
  861. " a comment line
  862. \" x maps " to x
  863. \\ y maps \ to y
  864. If you write a keymap file that will be useful for others, consider submitting
  865. it to the Vim maintainer for inclusion in the distribution:
  866. <maintainer@vim.org>
  867. HEBREW KEYMAP *keymap-hebrew*
  868. This file explains what characters are available in UTF-8 and CP1255 encodings,
  869. and what the keymaps are to get those characters:
  870. glyph encoding keymap ~
  871. Char UTF-8 cp1255 hebrew hebrewp name ~
  872. א 0x5d0 0xe0 t a 'alef
  873. ב 0x5d1 0xe1 c b bet
  874. ג 0x5d2 0xe2 d g gimel
  875. ד 0x5d3 0xe3 s d dalet
  876. ה 0x5d4 0xe4 v h he
  877. ו 0x5d5 0xe5 u v vav
  878. ז 0x5d6 0xe6 z z zayin
  879. ח 0x5d7 0xe7 j j het
  880. ט 0x5d8 0xe8 y T tet
  881. י 0x5d9 0xe9 h y yod
  882. ך 0x5da 0xea l K kaf sofit
  883. כ 0x5db 0xeb f k kaf
  884. ל 0x5dc 0xec k l lamed
  885. ם 0x5dd 0xed o M mem sofit
  886. מ 0x5de 0xee n m mem
  887. ן 0x5df 0xef i N nun sofit
  888. נ 0x5e0 0xf0 b n nun
  889. ס 0x5e1 0xf1 x s samech
  890. ע 0x5e2 0xf2 g u `ayin
  891. ף 0x5e3 0xf3 ; P pe sofit
  892. פ 0x5e4 0xf4 p p pe
  893. ץ 0x5e5 0xf5 . X tsadi sofit
  894. צ 0x5e6 0xf6 m x tsadi
  895. ק 0x5e7 0xf7 e q qof
  896. ר 0x5e8 0xf8 r r resh
  897. ש 0x5e9 0xf9 a w shin
  898. ת 0x5ea 0xfa , t tav
  899. Vowel marks and special punctuation:
  900. הְ 0x5b0 0xc0 A: A: sheva
  901. הֱ 0x5b1 0xc1 HE HE hataf segol
  902. הֲ 0x5b2 0xc2 HA HA hataf patah
  903. הֳ 0x5b3 0xc3 HO HO hataf qamats
  904. הִ 0x5b4 0xc4 I I hiriq
  905. הֵ 0x5b5 0xc5 AY AY tsere
  906. הֶ 0x5b6 0xc6 E E segol
  907. הַ 0x5b7 0xc7 AA AA patah
  908. הָ 0x5b8 0xc8 AO AO qamats
  909. הֹ 0x5b9 0xc9 O O holam
  910. הֻ 0x5bb 0xcb U U qubuts
  911. כּ 0x5bc 0xcc D D dagesh
  912. הֽ 0x5bd 0xcd ]T ]T meteg
  913. ה־ 0x5be 0xce ]Q ]Q maqaf
  914. בֿ 0x5bf 0xcf ]R ]R rafe
  915. ב׀ 0x5c0 0xd0 ]p ]p paseq
  916. שׁ 0x5c1 0xd1 SR SR shin-dot
  917. שׂ 0x5c2 0xd2 SL SL sin-dot
  918. ׃ 0x5c3 0xd3 ]P ]P sof-pasuq
  919. װ 0x5f0 0xd4 VV VV double-vav
  920. ױ 0x5f1 0xd5 VY VY vav-yod
  921. ײ 0x5f2 0xd6 YY YY yod-yod
  922. The following are only available in UTF-8
  923. Cantillation marks:
  924. glyph
  925. Char UTF-8 hebrew name
  926. ב֑ 0x591 C: etnahta
  927. ב֒ 0x592 Cs segol
  928. ב֓ 0x593 CS shalshelet
  929. ב֔ 0x594 Cz zaqef qatan
  930. ב֕ 0x595 CZ zaqef gadol
  931. ב֖ 0x596 Ct tipeha
  932. ב֗ 0x597 Cr revia
  933. ב֘ 0x598 Cq zarqa
  934. ב֙ 0x599 Cp pashta
  935. ב֚ 0x59a C! yetiv
  936. ב֛ 0x59b Cv tevir
  937. ב֜ 0x59c Cg geresh
  938. ב֝ 0x59d C* geresh qadim
  939. ב֞ 0x59e CG gershayim
  940. ב֟ 0x59f CP qarnei-parah
  941. ב֪ 0x5aa Cy yerach-ben-yomo
  942. ב֫ 0x5ab Co ole
  943. ב֬ 0x5ac Ci iluy
  944. ב֭ 0x5ad Cd dehi
  945. ב֮ 0x5ae Cn zinor
  946. ב֯ 0x5af CC masora circle
  947. Combining forms:
  948. ﬠ 0xfb20 X` Alternative `ayin
  949. ﬡ 0xfb21 X' Alternative 'alef
  950. ﬢ 0xfb22 X-d Alternative dalet
  951. ﬣ 0xfb23 X-h Alternative he
  952. ﬤ 0xfb24 X-k Alternative kaf
  953. ﬥ 0xfb25 X-l Alternative lamed
  954. ﬦ 0xfb26 X-m Alternative mem-sofit
  955. ﬧ 0xfb27 X-r Alternative resh
  956. ﬨ 0xfb28 X-t Alternative tav
  957. ﬩ 0xfb29 X-+ Alternative plus
  958. שׁ 0xfb2a XW shin+shin-dot
  959. שׂ 0xfb2b Xw shin+sin-dot
  960. שּׁ 0xfb2c X..W shin+shin-dot+dagesh
  961. שּׂ 0xfb2d X..w shin+sin-dot+dagesh
  962. אַ 0xfb2e XA alef+patah
  963. אָ 0xfb2f XO alef+qamats
  964. אּ 0xfb30 XI alef+hiriq (mapiq)
  965. בּ 0xfb31 X.b bet+dagesh
  966. גּ 0xfb32 X.g gimel+dagesh
  967. דּ 0xfb33 X.d dalet+dagesh
  968. הּ 0xfb34 X.h he+dagesh
  969. וּ 0xfb35 Xu vav+dagesh
  970. זּ 0xfb36 X.z zayin+dagesh
  971. טּ 0xfb38 X.T tet+dagesh
  972. יּ 0xfb39 X.y yud+dagesh
  973. ךּ 0xfb3a X.K kaf sofit+dagesh
  974. כּ 0xfb3b X.k kaf+dagesh
  975. לּ 0xfb3c X.l lamed+dagesh
  976. מּ 0xfb3e X.m mem+dagesh
  977. נּ 0xfb40 X.n nun+dagesh
  978. סּ 0xfb41 X.s samech+dagesh
  979. ףּ 0xfb43 X.P pe sofit+dagesh
  980. פּ 0xfb44 X.p pe+dagesh
  981. צּ 0xfb46 X.x tsadi+dagesh
  982. קּ 0xfb47 X.q qof+dagesh
  983. רּ 0xfb48 X.r resh+dagesh
  984. שּ 0xfb49 X.w shin+dagesh
  985. תּ 0xfb4a X.t tav+dagesh
  986. וֹ 0xfb4b Xo vav+holam
  987. בֿ 0xfb4c XRb bet+rafe
  988. כֿ 0xfb4d XRk kaf+rafe
  989. פֿ 0xfb4e XRp pe+rafe
  990. ﭏ 0xfb4f Xal alef-lamed
  991. ==============================================================================
  992. 10. Input with imactivatefunc() *mbyte-func*
  993. Vim has the 'imactivatefunc' and 'imstatusfunc' options. These are useful to
  994. activate/deactivate the input method from Vim in any way, also with an external
  995. command. For example, fcitx provide fcitx-remote command: >
  996. set iminsert=2
  997. set imsearch=2
  998. set imcmdline
  999. set imactivatefunc=ImActivate
  1000. function! ImActivate(active)
  1001. if a:active
  1002. call system('fcitx-remote -o')
  1003. else
  1004. call system('fcitx-remote -c')
  1005. endif
  1006. endfunction
  1007. set imstatusfunc=ImStatus
  1008. function! ImStatus()
  1009. return system('fcitx-remote')[0] is# '2'
  1010. endfunction
  1011. Using this script, you can activate/deactivate XIM via Vim even when it is not
  1012. compiled with |+xim|.
  1013. ==============================================================================
  1014. 11. Using UTF-8 *mbyte-utf8* *UTF-8* *utf-8* *utf8*
  1015. *Unicode* *unicode*
  1016. The Unicode character set was designed to include all characters from other
  1017. character sets. Therefore it is possible to write text in any language using
  1018. Unicode (with a few rarely used languages excluded). And it's mostly possible
  1019. to mix these languages in one file, which is impossible with other encodings.
  1020. Unicode can be encoded in several ways. The most popular one is UTF-8, which
  1021. uses one or more bytes for each character and is backwards compatible with
  1022. ASCII. On MS-Windows UTF-16 is also used (previously UCS-2), which uses
  1023. 16-bit words. Vim can support all of these encodings, but always uses UTF-8
  1024. internally.
  1025. Vim has comprehensive UTF-8 support. It works well in:
  1026. - xterm with UTF-8 support enabled
  1027. - Motif and GTK GUI
  1028. - MS-Windows GUI
  1029. - several other platforms
  1030. Double-width characters are supported. This works best with 'guifontwide' or
  1031. 'guifontset'. When using only 'guifont' the wide characters are drawn in the
  1032. normal width and a space to fill the gap. Note that the 'guifontset' option
  1033. is no longer relevant in the GTK+ 2 GUI.
  1034. *bom-bytes*
  1035. When reading a file a BOM (Byte Order Mark) can be used to recognize the
  1036. Unicode encoding:
  1037. EF BB BF UTF-8
  1038. FE FF UTF-16 big endian
  1039. FF FE UTF-16 little endian
  1040. 00 00 FE FF UTF-32 big endian
  1041. FF FE 00 00 UTF-32 little endian
  1042. UTF-8 is the recommended encoding. Note that it's difficult to tell utf-16
  1043. and utf-32 apart. Utf-16 is often used on MS-Windows, utf-32 is not
  1044. widespread as file format.
  1045. *mbyte-combining* *mbyte-composing*
  1046. A composing or combining character is used to change the meaning of the
  1047. character before it. The combining characters are drawn on top of the
  1048. preceding character.
  1049. Up to two combining characters can be used by default. This can be changed
  1050. with the 'maxcombine' option.
  1051. When editing text a composing character is mostly considered part of the
  1052. preceding character. For example "x" will delete a character and its
  1053. following composing characters by default.
  1054. If the 'delcombine' option is on, then pressing 'x' will delete the combining
  1055. characters, one at a time, then the base character. But when inserting, you
  1056. type the first character and the following composing characters separately,
  1057. after which they will be joined. The "r" command will not allow you to type a
  1058. combining character, because it doesn't know one is coming. Use "R" instead.
  1059. Bytes which are not part of a valid UTF-8 byte sequence are handled like a
  1060. single character and displayed as <xx>, where "xx" is the hex value of the
  1061. byte.
  1062. Overlong sequences are not handled specially and displayed like a valid
  1063. character. However, search patterns may not match on an overlong sequence.
  1064. (an overlong sequence is where more bytes are used than required for the
  1065. character.) An exception is NUL (zero) which is displayed as "<00>".
  1066. In the file and buffer the full range of Unicode characters can be used (31
  1067. bits). However, displaying only works for the characters present in the
  1068. selected font.
  1069. Useful commands:
  1070. - "ga" shows the decimal, hexadecimal and octal value of the character under
  1071. the cursor. If there are composing characters these are shown too. (If the
  1072. message is truncated, use ":messages").
  1073. - "g8" shows the bytes used in a UTF-8 character, also the composing
  1074. characters, as hex numbers.
  1075. - ":set encoding=utf-8 fileencodings=" forces using UTF-8 for all files. The
  1076. default is to use the current locale for 'encoding' and set 'fileencodings'
  1077. to automatically detect the encoding of a file.
  1078. STARTING VIM
  1079. If your current locale is in an UTF-8 encoding, Vim will automatically start
  1080. in UTF-8 mode.
  1081. If you are using another locale: >
  1082. set encoding=utf-8
  1083. You might also want to select the font used for the menus. Unfortunately this
  1084. doesn't always work. See the system specific remarks below, and 'langmenu'.
  1085. USING UTF-8 IN X-Windows *utf-8-in-xwindows*
  1086. Note: This section does not apply to the GTK+ 2 GUI.
  1087. You need to specify a font to be used. For double-wide characters another
  1088. font is required, which is exactly twice as wide. There are three ways to do
  1089. this:
  1090. 1. Set 'guifont' and let Vim find a matching 'guifontwide'
  1091. 2. Set 'guifont' and 'guifontwide'
  1092. 3. Set 'guifontset'
  1093. See the documentation for each option for details. Example: >
  1094. :set guifont=-misc-fixed-medium-r-normal--15-140-75-75-c-90-iso10646-1
  1095. You might also want to set the font used for the menus. This only works for
  1096. Motif. Use the ":hi Menu font={fontname}" command for this. |:highlight|
  1097. TYPING UTF-8 *utf-8-typing*
  1098. If you are using X-Windows, you should find an input method that supports
  1099. UTF-8.
  1100. If your system does not provide support for typing UTF-8, you can use the
  1101. 'keymap' feature. This allows writing a keymap file, which defines a UTF-8
  1102. character as a sequence of ASCII characters. See |mbyte-keymap|.
  1103. Another method is to set the current locale to the language you want to use
  1104. and for which you have a XIM available. Then set 'termencoding' to that
  1105. language and Vim will convert the typed characters to 'encoding' for you.
  1106. If everything else fails, you can type any character as four hex bytes: >
  1107. CTRL-V u 1234
  1108. "1234" is interpreted as a hex number. You must type four characters, prepend
  1109. a zero if necessary.
  1110. COMMAND ARGUMENTS *utf-8-char-arg*
  1111. Commands like |f|, |F|, |t| and |r| take an argument of one character. For
  1112. UTF-8 this argument may include one or two composing characters. These need
  1113. to be produced together with the base character, Vim doesn't wait for the next
  1114. character to be typed to find out if it is a composing character or not.
  1115. Using 'keymap' or |:lmap| is a nice way to type these characters.
  1116. The commands that search for a character in a line handle composing characters
  1117. as follows. When searching for a character without a composing character,
  1118. this will find matches in the text with or without composing characters. When
  1119. searching for a character with a composing character, this will only find
  1120. matches with that composing character. It was implemented this way, because
  1121. not everybody is able to type a composing character.
  1122. ==============================================================================
  1123. 12. Overview of options *mbyte-options*
  1124. These options are relevant for editing multibyte files. Check the help in
  1125. options.txt for detailed information.
  1126. 'encoding' Encoding used for the keyboard and display. It is also the
  1127. default encoding for files.
  1128. 'fileencoding' Encoding of a file. When it's different from 'encoding'
  1129. conversion is done when reading or writing the file.
  1130. 'fileencodings' List of possible encodings of a file. When opening a file
  1131. these will be tried and the first one that doesn't cause an
  1132. error is used for 'fileencoding'.
  1133. 'charconvert' Expression used to convert files from one encoding to another.
  1134. 'formatoptions' The 'm' flag can be included to have formatting break a line
  1135. at a multibyte character of 256 or higher. Thus is useful for
  1136. languages where a sequence of characters can be broken
  1137. anywhere.
  1138. 'guifontset' The list of font names used for a multibyte encoding. When
  1139. this option is not empty, it replaces 'guifont'.
  1140. 'keymap' Specify the name of a keyboard mapping.
  1141. ==============================================================================
  1142. Contributions specifically for the multibyte features by:
  1143. Chi-Deok Hwang <hwang@mizi.co.kr>
  1144. SungHyun Nam <goweol@gmail.com>
  1145. K.Nagano <nagano@atese.advantest.co.jp>
  1146. Taro Muraoka <koron@tka.att.ne.jp>
  1147. Yasuhiro Matsumoto <mattn@mail.goo.ne.jp>
  1148. vim:tw=78:ts=8:noet:ft=help:norl: