cms/drupal/profiles/drustack/libraries/htmlpurifier/INSTALL
changeset 541 e756a8c72c3d
equal deleted inserted replaced
540:07239de796bb 541:e756a8c72c3d
       
     1 
       
     2 Install
       
     3     How to install HTML Purifier
       
     4 
       
     5 HTML Purifier is designed to run out of the box, so actually using the
       
     6 library is extremely easy.  (Although... if you were looking for a
       
     7 step-by-step installation GUI, you've downloaded the wrong software!)
       
     8 
       
     9 While the impatient can get going immediately with some of the sample
       
    10 code at the bottom of this library, it's well worth reading this entire
       
    11 document--most of the other documentation assumes that you are familiar
       
    12 with these contents.
       
    13 
       
    14 
       
    15 ---------------------------------------------------------------------------
       
    16 1.  Compatibility
       
    17 
       
    18 HTML Purifier is PHP 5 only, and is actively tested from PHP 5.0.5 and
       
    19 up. It has no core dependencies with other libraries. PHP
       
    20 4 support was deprecated on December 31, 2007 with HTML Purifier 3.0.0.
       
    21 HTML Purifier is not compatible with zend.ze1_compatibility_mode.
       
    22 
       
    23 These optional extensions can enhance the capabilities of HTML Purifier:
       
    24 
       
    25     * iconv  : Converts text to and from non-UTF-8 encodings
       
    26     * bcmath : Used for unit conversion and imagecrash protection
       
    27     * tidy   : Used for pretty-printing HTML
       
    28 
       
    29 These optional libraries can enhance the capabilities of HTML Purifier:
       
    30 
       
    31     * CSSTidy : Clean CSS stylesheets using %Core.ExtractStyleBlocks
       
    32     * Net_IDNA2 (PEAR) : IRI support using %Core.EnableIDNA
       
    33 
       
    34 ---------------------------------------------------------------------------
       
    35 2.  Reconnaissance
       
    36 
       
    37 A big plus of HTML Purifier is its inerrant support of standards, so
       
    38 your web-pages should be standards-compliant.  (They should also use
       
    39 semantic markup, but that's another issue altogether, one HTML Purifier
       
    40 cannot fix without reading your mind.)
       
    41 
       
    42 HTML Purifier can process these doctypes:
       
    43 
       
    44 * XHTML 1.0 Transitional (default)
       
    45 * XHTML 1.0 Strict
       
    46 * HTML 4.01 Transitional
       
    47 * HTML 4.01 Strict
       
    48 * XHTML 1.1
       
    49 
       
    50 ...and these character encodings:
       
    51 
       
    52 * UTF-8 (default)
       
    53 * Any encoding iconv supports (with crippled internationalization support)
       
    54 
       
    55 These defaults reflect what my choices would be if I were authoring an
       
    56 HTML document, however, what you choose depends on the nature of your
       
    57 codebase.  If you don't know what doctype you are using, you can determine
       
    58 the doctype from this identifier at the top of your source code:
       
    59 
       
    60     <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
       
    61         "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
       
    62 
       
    63 ...and the character encoding from this code:
       
    64 
       
    65     <meta http-equiv="Content-type" content="text/html;charset=ENCODING">
       
    66 
       
    67 If the character encoding declaration is missing, STOP NOW, and
       
    68 read 'docs/enduser-utf8.html' (web accessible at
       
    69 http://htmlpurifier.org/docs/enduser-utf8.html).  In fact, even if it is
       
    70 present, read this document anyway, as many websites specify their
       
    71 document's character encoding incorrectly.
       
    72 
       
    73 
       
    74 ---------------------------------------------------------------------------
       
    75 3.  Including the library
       
    76 
       
    77 The procedure is quite simple:
       
    78 
       
    79     require_once '/path/to/library/HTMLPurifier.auto.php';
       
    80 
       
    81 This will setup an autoloader, so the library's files are only included
       
    82 when you use them.
       
    83 
       
    84 Only the contents in the library/ folder are necessary, so you can remove
       
    85 everything else when using HTML Purifier in a production environment.
       
    86 
       
    87 If you installed HTML Purifier via PEAR, all you need to do is:
       
    88 
       
    89     require_once 'HTMLPurifier.auto.php';
       
    90 
       
    91 Please note that the usual PEAR practice of including just the classes you
       
    92 want will not work with HTML Purifier's autoloading scheme.
       
    93 
       
    94 Advanced users, read on; other users can skip to section 4.
       
    95 
       
    96 Autoload compatibility
       
    97 ----------------------
       
    98 
       
    99     HTML Purifier attempts to be as smart as possible when registering an
       
   100     autoloader, but there are some cases where you will need to change
       
   101     your own code to accomodate HTML Purifier. These are those cases:
       
   102 
       
   103     PHP VERSION IS LESS THAN 5.1.2, AND YOU'VE DEFINED __autoload
       
   104         Because spl_autoload_register() doesn't exist in early versions
       
   105         of PHP 5, HTML Purifier has no way of adding itself to the autoload
       
   106         stack. Modify your __autoload function to test
       
   107         HTMLPurifier_Bootstrap::autoload($class)
       
   108 
       
   109         For example, suppose your autoload function looks like this:
       
   110 
       
   111             function __autoload($class) {
       
   112                 require str_replace('_', '/', $class) . '.php';
       
   113                 return true;
       
   114             }
       
   115 
       
   116         A modified version with HTML Purifier would look like this:
       
   117 
       
   118             function __autoload($class) {
       
   119                 if (HTMLPurifier_Bootstrap::autoload($class)) return true;
       
   120                 require str_replace('_', '/', $class) . '.php';
       
   121                 return true;
       
   122             }
       
   123 
       
   124         Note that there *is* some custom behavior in our autoloader; the
       
   125         original autoloader in our example would work for 99% of the time,
       
   126         but would fail when including language files.
       
   127 
       
   128     AN __autoload FUNCTION IS DECLARED AFTER OUR AUTOLOADER IS REGISTERED
       
   129         spl_autoload_register() has the curious behavior of disabling
       
   130         the existing __autoload() handler. Users need to explicitly
       
   131         spl_autoload_register('__autoload'). Because we use SPL when it
       
   132         is available, __autoload() will ALWAYS be disabled. If __autoload()
       
   133         is declared before HTML Purifier is loaded, this is not a problem:
       
   134         HTML Purifier will register the function for you. But if it is
       
   135         declared afterwards, it will mysteriously not work. This
       
   136         snippet of code (after your autoloader is defined) will fix it:
       
   137 
       
   138             spl_autoload_register('__autoload')
       
   139 
       
   140     Users should also be on guard if they use a version of PHP previous
       
   141     to 5.1.2 without an autoloader--HTML Purifier will define __autoload()
       
   142     for you, which can collide with an autoloader that was added by *you*
       
   143     later.
       
   144 
       
   145 
       
   146 For better performance
       
   147 ----------------------
       
   148 
       
   149     Opcode caches, which greatly speed up PHP initialization for scripts
       
   150     with large amounts of code (HTML Purifier included), don't like
       
   151     autoloaders. We offer an include file that includes all of HTML Purifier's
       
   152     files in one go in an opcode cache friendly manner:
       
   153 
       
   154         // If /path/to/library isn't already in your include path, uncomment
       
   155         // the below line:
       
   156         // require '/path/to/library/HTMLPurifier.path.php';
       
   157 
       
   158         require 'HTMLPurifier.includes.php';
       
   159 
       
   160     Optional components still need to be included--you'll know if you try to
       
   161     use a feature and you get a class doesn't exists error! The autoloader
       
   162     can be used in conjunction with this approach to catch classes that are
       
   163     missing. Simply add this afterwards:
       
   164 
       
   165         require 'HTMLPurifier.autoload.php';
       
   166 
       
   167 Standalone version
       
   168 ------------------
       
   169 
       
   170     HTML Purifier has a standalone distribution; you can also generate
       
   171     a standalone file from the full version by running the script
       
   172     maintenance/generate-standalone.php . The standalone version has the
       
   173     benefit of having most of its code in one file, so parsing is much
       
   174     faster and the library is easier to manage.
       
   175 
       
   176     If HTMLPurifier.standalone.php exists in the library directory, you
       
   177     can use it like this:
       
   178 
       
   179         require '/path/to/HTMLPurifier.standalone.php';
       
   180 
       
   181     This is equivalent to including HTMLPurifier.includes.php, except that
       
   182     the contents of standalone/ will be added to your path. To override this
       
   183     behavior, specify a new HTMLPURIFIER_PREFIX where standalone files can
       
   184     be found (usually, this will be one directory up, the "true" library
       
   185     directory in full distributions). Don't forget to set your path too!
       
   186 
       
   187     The autoloader can be added to the end to ensure the classes are
       
   188     loaded when necessary; otherwise you can manually include them.
       
   189     To use the autoloader, use this:
       
   190 
       
   191         require 'HTMLPurifier.autoload.php';
       
   192 
       
   193 For advanced users
       
   194 ------------------
       
   195 
       
   196     HTMLPurifier.auto.php performs a number of operations that can be done
       
   197     individually. These are:
       
   198 
       
   199         HTMLPurifier.path.php
       
   200             Puts /path/to/library in the include path. For high performance,
       
   201             this should be done in php.ini.
       
   202 
       
   203         HTMLPurifier.autoload.php
       
   204             Registers our autoload handler HTMLPurifier_Bootstrap::autoload($class).
       
   205 
       
   206     You can do these operations by yourself--in fact, you must modify your own
       
   207     autoload handler if you are using a version of PHP earlier than PHP 5.1.2
       
   208     (See "Autoload compatibility" above).
       
   209 
       
   210 
       
   211 ---------------------------------------------------------------------------
       
   212 4. Configuration
       
   213 
       
   214 HTML Purifier is designed to run out-of-the-box, but occasionally HTML
       
   215 Purifier needs to be told what to do.  If you answer no to any of these
       
   216 questions, read on; otherwise, you can skip to the next section (or, if you're
       
   217 into configuring things just for the heck of it, skip to 4.3).
       
   218 
       
   219 * Am I using UTF-8?
       
   220 * Am I using XHTML 1.0 Transitional?
       
   221 
       
   222 If you answered no to any of these questions, instantiate a configuration
       
   223 object and read on:
       
   224 
       
   225     $config = HTMLPurifier_Config::createDefault();
       
   226 
       
   227 
       
   228 4.1. Setting a different character encoding
       
   229 
       
   230 You really shouldn't use any other encoding except UTF-8, especially if you
       
   231 plan to support multilingual websites (read section three for more details).
       
   232 However, switching to UTF-8 is not always immediately feasible, so we can
       
   233 adapt.
       
   234 
       
   235 HTML Purifier uses iconv to support other character encodings, as such,
       
   236 any encoding that iconv supports <http://www.gnu.org/software/libiconv/>
       
   237 HTML Purifier supports with this code:
       
   238 
       
   239     $config->set('Core.Encoding', /* put your encoding here */);
       
   240 
       
   241 An example usage for Latin-1 websites (the most common encoding for English
       
   242 websites):
       
   243 
       
   244     $config->set('Core.Encoding', 'ISO-8859-1');
       
   245 
       
   246 Note that HTML Purifier's support for non-Unicode encodings is crippled by the
       
   247 fact that any character not supported by that encoding will be silently
       
   248 dropped, EVEN if it is ampersand escaped.  If you want to work around
       
   249 this, you are welcome to read docs/enduser-utf8.html for a fix,
       
   250 but please be cognizant of the issues the "solution" creates (for this
       
   251 reason, I do not include the solution in this document).
       
   252 
       
   253 
       
   254 4.2. Setting a different doctype
       
   255 
       
   256 For those of you using HTML 4.01 Transitional, you can disable
       
   257 XHTML output like this:
       
   258 
       
   259     $config->set('HTML.Doctype', 'HTML 4.01 Transitional');
       
   260 
       
   261 Other supported doctypes include:
       
   262 
       
   263     * HTML 4.01 Strict
       
   264     * HTML 4.01 Transitional
       
   265     * XHTML 1.0 Strict
       
   266     * XHTML 1.0 Transitional
       
   267     * XHTML 1.1
       
   268 
       
   269 
       
   270 4.3. Other settings
       
   271 
       
   272 There are more configuration directives which can be read about
       
   273 here: <http://htmlpurifier.org/live/configdoc/plain.html>  They're a bit boring,
       
   274 but they can help out for those of you who like to exert maximum control over
       
   275 your code.  Some of the more interesting ones are configurable at the
       
   276 demo <http://htmlpurifier.org/demo.php> and are well worth looking into
       
   277 for your own system.
       
   278 
       
   279 For example, you can fine tune allowed elements and attributes, convert
       
   280 relative URLs to absolute ones, and even autoparagraph input text! These
       
   281 are, respectively, %HTML.Allowed, %URI.MakeAbsolute and %URI.Base, and
       
   282 %AutoFormat.AutoParagraph. The %Namespace.Directive naming convention
       
   283 translates to:
       
   284 
       
   285     $config->set('Namespace.Directive', $value);
       
   286 
       
   287 E.g.
       
   288 
       
   289     $config->set('HTML.Allowed', 'p,b,a[href],i');
       
   290     $config->set('URI.Base', 'http://www.example.com');
       
   291     $config->set('URI.MakeAbsolute', true);
       
   292     $config->set('AutoFormat.AutoParagraph', true);
       
   293 
       
   294 
       
   295 ---------------------------------------------------------------------------
       
   296 5. Caching
       
   297 
       
   298 HTML Purifier generates some cache files (generally one or two) to speed up
       
   299 its execution. For maximum performance, make sure that
       
   300 library/HTMLPurifier/DefinitionCache/Serializer is writeable by the webserver.
       
   301 
       
   302 If you are in the library/ folder of HTML Purifier, you can set the
       
   303 appropriate permissions using:
       
   304 
       
   305     chmod -R 0755 HTMLPurifier/DefinitionCache/Serializer
       
   306 
       
   307 If the above command doesn't work, you may need to assign write permissions
       
   308 to all. This may be necessary if your webserver runs as nobody, but is
       
   309 not recommended since it means any other user can write files in the
       
   310 directory. Use:
       
   311 
       
   312     chmod -R 0777 HTMLPurifier/DefinitionCache/Serializer
       
   313 
       
   314 You can also chmod files via your FTP client; this option
       
   315 is usually accessible by right clicking the corresponding directory and
       
   316 then selecting "chmod" or "file permissions".
       
   317 
       
   318 Starting with 2.0.1, HTML Purifier will generate friendly error messages
       
   319 that will tell you exactly what you have to chmod the directory to, if in doubt,
       
   320 follow its advice.
       
   321 
       
   322 If you are unable or unwilling to give write permissions to the cache
       
   323 directory, you can either disable the cache (and suffer a performance
       
   324 hit):
       
   325 
       
   326     $config->set('Core.DefinitionCache', null);
       
   327 
       
   328 Or move the cache directory somewhere else (no trailing slash):
       
   329 
       
   330     $config->set('Cache.SerializerPath', '/home/user/absolute/path');
       
   331 
       
   332 
       
   333 ---------------------------------------------------------------------------
       
   334 6.   Using the code
       
   335 
       
   336 The interface is mind-numbingly simple:
       
   337 
       
   338     $purifier = new HTMLPurifier($config);
       
   339     $clean_html = $purifier->purify( $dirty_html );
       
   340 
       
   341 That's it!  For more examples, check out docs/examples/ (they aren't very
       
   342 different though).  Also, docs/enduser-slow.html gives advice on what to
       
   343 do if HTML Purifier is slowing down your application.
       
   344 
       
   345 
       
   346 ---------------------------------------------------------------------------
       
   347 7.   Quick install
       
   348 
       
   349 First, make sure library/HTMLPurifier/DefinitionCache/Serializer is
       
   350 writable by the webserver (see Section 5: Caching above for details).
       
   351 If your website is in UTF-8 and XHTML Transitional, use this code:
       
   352 
       
   353 <?php
       
   354     require_once '/path/to/htmlpurifier/library/HTMLPurifier.auto.php';
       
   355 
       
   356     $config = HTMLPurifier_Config::createDefault();
       
   357     $purifier = new HTMLPurifier($config);
       
   358     $clean_html = $purifier->purify($dirty_html);
       
   359 ?>
       
   360 
       
   361 If your website is in a different encoding or doctype, use this code:
       
   362 
       
   363 <?php
       
   364     require_once '/path/to/htmlpurifier/library/HTMLPurifier.auto.php';
       
   365 
       
   366     $config = HTMLPurifier_Config::createDefault();
       
   367     $config->set('Core.Encoding', 'ISO-8859-1'); // replace with your encoding
       
   368     $config->set('HTML.Doctype', 'HTML 4.01 Transitional'); // replace with your doctype
       
   369     $purifier = new HTMLPurifier($config);
       
   370 
       
   371     $clean_html = $purifier->purify($dirty_html);
       
   372 ?>
       
   373 
       
   374     vim: et sw=4 sts=4