INSTALL 12 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342
  1. Install
  2. How to install HTML Purifier
  3. HTML Purifier is designed to run out of the box, so actually using the
  4. library is extremely easy. (Although... if you were looking for a
  5. step-by-step installation GUI, you've downloaded the wrong software!)
  6. While the impatient can get going immediately with some of the sample
  7. code at the bottom of this library, it's well worth reading this entire
  8. document--most of the other documentation assumes that you are familiar
  9. with these contents.
  10. ---------------------------------------------------------------------------
  11. 1. Compatibility
  12. HTML Purifier is PHP 5 and PHP 7, and is actively tested from PHP 5.3
  13. and up. It has no core dependencies with other libraries.
  14. These optional extensions can enhance the capabilities of HTML Purifier:
  15. * iconv : Converts text to and from non-UTF-8 encodings
  16. * bcmath : Used for unit conversion and imagecrash protection
  17. * tidy : Used for pretty-printing HTML
  18. These optional libraries can enhance the capabilities of HTML Purifier:
  19. * CSSTidy : Clean CSS stylesheets using %Core.ExtractStyleBlocks
  20. Note: You should use the modernized fork of CSSTidy available
  21. at https://github.com/Cerdic/CSSTidy
  22. * Net_IDNA2 (PEAR) : IRI support using %Core.EnableIDNA
  23. Note: This is not necessary for PHP 5.3 or later
  24. ---------------------------------------------------------------------------
  25. 2. Reconnaissance
  26. A big plus of HTML Purifier is its inerrant support of standards, so
  27. your web-pages should be standards-compliant. (They should also use
  28. semantic markup, but that's another issue altogether, one HTML Purifier
  29. cannot fix without reading your mind.)
  30. HTML Purifier can process these doctypes:
  31. * XHTML 1.0 Transitional (default)
  32. * XHTML 1.0 Strict
  33. * HTML 4.01 Transitional
  34. * HTML 4.01 Strict
  35. * XHTML 1.1
  36. ...and these character encodings:
  37. * UTF-8 (default)
  38. * Any encoding iconv supports (with crippled internationalization support)
  39. These defaults reflect what my choices would be if I were authoring an
  40. HTML document, however, what you choose depends on the nature of your
  41. codebase. If you don't know what doctype you are using, you can determine
  42. the doctype from this identifier at the top of your source code:
  43. <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
  44. "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
  45. ...and the character encoding from this code:
  46. <meta http-equiv="Content-type" content="text/html;charset=ENCODING">
  47. If the character encoding declaration is missing, STOP NOW, and
  48. read 'docs/enduser-utf8.html' (web accessible at
  49. http://htmlpurifier.org/docs/enduser-utf8.html). In fact, even if it is
  50. present, read this document anyway, as many websites specify their
  51. document's character encoding incorrectly.
  52. ---------------------------------------------------------------------------
  53. 3. Including the library
  54. The procedure is quite simple:
  55. require_once '/path/to/library/HTMLPurifier.auto.php';
  56. This will setup an autoloader, so the library's files are only included
  57. when you use them.
  58. Only the contents in the library/ folder are necessary, so you can remove
  59. everything else when using HTML Purifier in a production environment.
  60. If you installed HTML Purifier via PEAR, all you need to do is:
  61. require_once 'HTMLPurifier.auto.php';
  62. Please note that the usual PEAR practice of including just the classes you
  63. want will not work with HTML Purifier's autoloading scheme.
  64. Advanced users, read on; other users can skip to section 4.
  65. Autoload compatibility
  66. ----------------------
  67. HTML Purifier attempts to be as smart as possible when registering an
  68. autoloader, but there are some cases where you will need to change
  69. your own code to accomodate HTML Purifier. These are those cases:
  70. AN __autoload FUNCTION IS DECLARED AFTER OUR AUTOLOADER IS REGISTERED
  71. spl_autoload_register() has the curious behavior of disabling
  72. the existing __autoload() handler. Users need to explicitly
  73. spl_autoload_register('__autoload'). Because we use SPL when it
  74. is available, __autoload() will ALWAYS be disabled. If __autoload()
  75. is declared before HTML Purifier is loaded, this is not a problem:
  76. HTML Purifier will register the function for you. But if it is
  77. declared afterwards, it will mysteriously not work. This
  78. snippet of code (after your autoloader is defined) will fix it:
  79. spl_autoload_register('__autoload')
  80. For better performance
  81. ----------------------
  82. Opcode caches, which greatly speed up PHP initialization for scripts
  83. with large amounts of code (HTML Purifier included), don't like
  84. autoloaders. We offer an include file that includes all of HTML Purifier's
  85. files in one go in an opcode cache friendly manner:
  86. // If /path/to/library isn't already in your include path, uncomment
  87. // the below line:
  88. // require '/path/to/library/HTMLPurifier.path.php';
  89. require 'HTMLPurifier.includes.php';
  90. Optional components still need to be included--you'll know if you try to
  91. use a feature and you get a class doesn't exists error! The autoloader
  92. can be used in conjunction with this approach to catch classes that are
  93. missing. Simply add this afterwards:
  94. require 'HTMLPurifier.autoload.php';
  95. Standalone version
  96. ------------------
  97. HTML Purifier has a standalone distribution; you can also generate
  98. a standalone file from the full version by running the script
  99. maintenance/generate-standalone.php . The standalone version has the
  100. benefit of having most of its code in one file, so parsing is much
  101. faster and the library is easier to manage.
  102. If HTMLPurifier.standalone.php exists in the library directory, you
  103. can use it like this:
  104. require '/path/to/HTMLPurifier.standalone.php';
  105. This is equivalent to including HTMLPurifier.includes.php, except that
  106. the contents of standalone/ will be added to your path. To override this
  107. behavior, specify a new HTMLPURIFIER_PREFIX where standalone files can
  108. be found (usually, this will be one directory up, the "true" library
  109. directory in full distributions). Don't forget to set your path too!
  110. The autoloader can be added to the end to ensure the classes are
  111. loaded when necessary; otherwise you can manually include them.
  112. To use the autoloader, use this:
  113. require 'HTMLPurifier.autoload.php';
  114. For advanced users
  115. ------------------
  116. HTMLPurifier.auto.php performs a number of operations that can be done
  117. individually. These are:
  118. HTMLPurifier.path.php
  119. Puts /path/to/library in the include path. For high performance,
  120. this should be done in php.ini.
  121. HTMLPurifier.autoload.php
  122. Registers our autoload handler HTMLPurifier_Bootstrap::autoload($class).
  123. You can do these operations by yourself, if you like.
  124. ---------------------------------------------------------------------------
  125. 4. Configuration
  126. HTML Purifier is designed to run out-of-the-box, but occasionally HTML
  127. Purifier needs to be told what to do. If you answer no to any of these
  128. questions, read on; otherwise, you can skip to the next section (or, if you're
  129. into configuring things just for the heck of it, skip to 4.3).
  130. * Am I using UTF-8?
  131. * Am I using XHTML 1.0 Transitional?
  132. If you answered no to any of these questions, instantiate a configuration
  133. object and read on:
  134. $config = HTMLPurifier_Config::createDefault();
  135. 4.1. Setting a different character encoding
  136. You really shouldn't use any other encoding except UTF-8, especially if you
  137. plan to support multilingual websites (read section three for more details).
  138. However, switching to UTF-8 is not always immediately feasible, so we can
  139. adapt.
  140. HTML Purifier uses iconv to support other character encodings, as such,
  141. any encoding that iconv supports <http://www.gnu.org/software/libiconv/>
  142. HTML Purifier supports with this code:
  143. $config->set('Core.Encoding', /* put your encoding here */);
  144. An example usage for Latin-1 websites (the most common encoding for English
  145. websites):
  146. $config->set('Core.Encoding', 'ISO-8859-1');
  147. Note that HTML Purifier's support for non-Unicode encodings is crippled by the
  148. fact that any character not supported by that encoding will be silently
  149. dropped, EVEN if it is ampersand escaped. If you want to work around
  150. this, you are welcome to read docs/enduser-utf8.html for a fix,
  151. but please be cognizant of the issues the "solution" creates (for this
  152. reason, I do not include the solution in this document).
  153. 4.2. Setting a different doctype
  154. For those of you using HTML 4.01 Transitional, you can disable
  155. XHTML output like this:
  156. $config->set('HTML.Doctype', 'HTML 4.01 Transitional');
  157. Other supported doctypes include:
  158. * HTML 4.01 Strict
  159. * HTML 4.01 Transitional
  160. * XHTML 1.0 Strict
  161. * XHTML 1.0 Transitional
  162. * XHTML 1.1
  163. 4.3. Other settings
  164. There are more configuration directives which can be read about
  165. here: <http://htmlpurifier.org/live/configdoc/plain.html> They're a bit boring,
  166. but they can help out for those of you who like to exert maximum control over
  167. your code. Some of the more interesting ones are configurable at the
  168. demo <http://htmlpurifier.org/demo.php> and are well worth looking into
  169. for your own system.
  170. For example, you can fine tune allowed elements and attributes, convert
  171. relative URLs to absolute ones, and even autoparagraph input text! These
  172. are, respectively, %HTML.Allowed, %URI.MakeAbsolute and %URI.Base, and
  173. %AutoFormat.AutoParagraph. The %Namespace.Directive naming convention
  174. translates to:
  175. $config->set('Namespace.Directive', $value);
  176. E.g.
  177. $config->set('HTML.Allowed', 'p,b,a[href],i');
  178. $config->set('URI.Base', 'http://www.example.com');
  179. $config->set('URI.MakeAbsolute', true);
  180. $config->set('AutoFormat.AutoParagraph', true);
  181. ---------------------------------------------------------------------------
  182. 5. Caching
  183. HTML Purifier generates some cache files (generally one or two) to speed up
  184. its execution. For maximum performance, make sure that
  185. library/HTMLPurifier/DefinitionCache/Serializer is writeable by the webserver.
  186. If you are in the library/ folder of HTML Purifier, you can set the
  187. appropriate permissions using:
  188. chmod -R 0755 HTMLPurifier/DefinitionCache/Serializer
  189. If the above command doesn't work, you may need to assign write permissions
  190. to group:
  191. chmod -R 0775 HTMLPurifier/DefinitionCache/Serializer
  192. You can also chmod files via your FTP client; this option
  193. is usually accessible by right clicking the corresponding directory and
  194. then selecting "chmod" or "file permissions".
  195. Starting with 2.0.1, HTML Purifier will generate friendly error messages
  196. that will tell you exactly what you have to chmod the directory to, if in doubt,
  197. follow its advice.
  198. If you are unable or unwilling to give write permissions to the cache
  199. directory, you can either disable the cache (and suffer a performance
  200. hit):
  201. $config->set('Core.DefinitionCache', null);
  202. Or move the cache directory somewhere else (no trailing slash):
  203. $config->set('Cache.SerializerPath', '/home/user/absolute/path');
  204. ---------------------------------------------------------------------------
  205. 6. Using the code
  206. The interface is mind-numbingly simple:
  207. $purifier = new HTMLPurifier($config);
  208. $clean_html = $purifier->purify( $dirty_html );
  209. That's it! For more examples, check out docs/examples/ (they aren't very
  210. different though). Also, docs/enduser-slow.html gives advice on what to
  211. do if HTML Purifier is slowing down your application.
  212. ---------------------------------------------------------------------------
  213. 7. Quick install
  214. First, make sure library/HTMLPurifier/DefinitionCache/Serializer is
  215. writable by the webserver (see Section 5: Caching above for details).
  216. If your website is in UTF-8 and XHTML Transitional, use this code:
  217. <?php
  218. require_once '/path/to/htmlpurifier/library/HTMLPurifier.auto.php';
  219. $config = HTMLPurifier_Config::createDefault();
  220. $purifier = new HTMLPurifier($config);
  221. $clean_html = $purifier->purify($dirty_html);
  222. ?>
  223. If your website is in a different encoding or doctype, use this code:
  224. <?php
  225. require_once '/path/to/htmlpurifier/library/HTMLPurifier.auto.php';
  226. $config = HTMLPurifier_Config::createDefault();
  227. $config->set('Core.Encoding', 'ISO-8859-1'); // replace with your encoding
  228. $config->set('HTML.Doctype', 'HTML 4.01 Transitional'); // replace with your doctype
  229. $purifier = new HTMLPurifier($config);
  230. $clean_html = $purifier->purify($dirty_html);
  231. ?>
  232. vim: et sw=4 sts=4