Brita component with HTML Purifier

By Debugged Interactive Designs (debuggeddesigns)
Brita is a CakePHP Component wrapper class created to take advantage of the functionality provided by HTML Purifier. HTML Purifier is a standards-compliant HTML filter library written in PHP. HTML Purifier will not only remove all malicious code (better known as XSS) with a thoroughly audited, secure yet permissive whitelist, it will also make sure your documents are standards compliant, something only achievable with a comprehensive knowledge of W3C's specifications.

Step 1: Download and unzip archive

Download an HTMLPurifier archive (either .zip or .tar.gz) at http://htmlpurifier.org/download.html and unzip the archive into the directory /app/vendors/htmlpurifier/

Note: Only the contents in the /app/vendors/htmlpurifier/library/ folder are necessary, so you can remove everything else when using HTML Purifier in a production environment.

Note 2: The folder /app/vendors/htmlpurifier/library/HTMLPurifier/DefinitionCache/Serializer must be writeable by the webserver.


Step 2: Create brita component

Filename: /app/controllers/components/brita.php

Component Class:

Download code <?php 
//cake's version of a require_once() call
vendor('htmlpurifier'.DS.'library'.DS.'HTMLPurifier.auto'); //use this with the 1.1 core
//App::import('Vendor','HTMLPurifier' ,array('file'=>'htmlpurifier'.DS.'library'.DS.'HTMLPurifier.auto.php')); //use this with the 1.2 core

class BritaComponent extends Object {

    var 
$controller;

    function 
startup( &$controller ) {

        
//the next few lines allow the config settings to be cached
        
$config HTMLPurifier_Config::createDefault();
        
$config->set('HTML''DefinitionID''made by debugged interactive designs');
        
$config->set('HTML''DefinitionRev'1);
        
//levels describe how aggressive the Tidy module should be when cleaning up html
        //four levels: none, light, medium, heavy
        
$config->set('HTML''TidyLevel''heavy');
        
//check the top of your html file for the next two
        
$config->set('HTML''Doctype''XHTML 1.0 Transitional');
        
$config->set('Core''Encoding''ISO-8859-1');
        
        
//BritaComponent instance of controller is replaced by a htmlpurifier instance
        
$controller->brita =& new HTMLPurifier($config);
        
$controller->set('brita',$controller->brita);

   }
   
}
?>

Step 3: Use the brita component inside a controller

Filename: /app/controllers/tests_controller.php

Controller Class:

Download code <?php 
class TestsController extends AppController {
    var 
$name 'Tests';
    var 
$components = array('Brita'); //import the Brita Component
       
    
function brita() {
        
//fake user input that we will purify (for testing)
        
$dirty_html '<br><br><center><font size="2">testing</font></center>';
        
//this one line of code does all the purifying
        
$clean_html $this->brita->purify$dirty_html );
        
//set the before and after html for the test view
        
$this->set('clean_html',$clean_html);
        
$this->set('dirty_html',$dirty_html);
    }   
}
?>

Step 4: Create a test view

Filename: /app/views/tests/brita.thtml
Download code
<div>DIRTY HTML = <?php echo htmlentities($dirty_html);?></div>
<div style="border:1px solid black;"><?php echo $dirty_html;?></div>

<div>CLEAN HTML = <?php echo htmlentities($clean_html);?></div>
<div style="border:1px solid black;"><?php echo $clean_html;?></div> 

 

Comments 837

CakePHP Team Comments Author Comments
 

Comment

1 Why not use as helper

I think encapsulating htmlpurifier in an AppHelper would be more suitable.
One could use different setup depending on cake theme or response type (xml,rss,pdf,html,...).
Posted Nov 14, 2008 by Pawel Gasiorowski
 

Comment

2 A better fit in the View layer

Since this relates to data output, it might better fit as a Helper.

Then it's usage would be like:

{{{
DIRTY HTML =

CLEAN HTML = purify($dirty_html));?>
purify($dirty_html);?>
}}}
Posted Nov 17, 2008 by Tom OReilly
 

Comment

3 Why I didn't use it as a helper...

Thanks for the comments guys. In this article I was hoping to show how HTML Purifier could be used to clean user input (it's very strict with xss) that would be saved to a database, but it's other use (making documents standards compliant) could definitely be useful in a helper.
Posted Nov 21, 2008 by Debugged Interactive Designs
 

Comment

4 I have one just like this.

I set something up that looks almost exactly like this before this was ever published.

I set it up as a component also. We were building a content management system for our company use. The reason we set it up as a component rather than a helper was because we didn't want invalid/XSS stuff being submitted to the database to begin with. We wanted valid, clean markup saved in the database, rather than just cleaning it when we display it.

I've really grown to love the HTML Purifier. It's pretty customizable too.
Posted Feb 6, 2009 by Justin Thomas
 

Comment

5 Why not use as helper

I think encapsulating htmlpurifier in an AppHelper would be more suitable.
One could use different setup depending on cake theme or response type (xml,rss,pdf,html,...).

As said by the HTMLPurifier guy himself:
Speeding up HTML Purifier: http://htmlpurifier.org/docs/enduser-slow.html HTML Purifier is a very powerful library. But with power comes great responsibility, in the form of longer execution times. Remember, this library isn't lightly grazing over submitted HTML: it's deconstructing the whole thing, rigorously checking the parts, and then putting it back together.
He suggest "inbound filtering" or "caching the filtered output" and so would I.
Especially if you can't cache your output because it changes frequently/allways you should rather consider using HTMLPurifier as component than as helper.
Posted Mar 14, 2009 by Lasse Fister