Tidy Output Filtering

By Matthew Harris (kuja)
If you'd like to filter all output from your application (layouts and views) through Tidy for cleaner markup, then this article is for you!
I recently stumbled across an urge to output clean HTML/XHTML on all my pages without having to worry much about writing the actual markup. Being able to do this would allow me to generate clean markup in all cases (at the cost of performance, of course).

Now, before starting, please note that you will need the php-tidy extension. You can read more at http://php.net/tidy.

Now, onto the good stuff. The solution I came to (thanks to AD7six) was creating a helper. I called this the TidyFilterHelper (I thought that TidyHelper may have been a bit misleading in this case).

You simply create the tidy_filter.php helper file, drop it in APP/views/helpers/.

The code is as follows:

Helper Class:

Download code <?php 
class TidyFilterHelper extends AppHelper {
    function 
__construct()
    {
        
ob_start();
    }
    
    function 
__destruct()
    {
        
$output ob_get_clean();
        
$config = array('indent' => true'output-xhtml' => true);
        
$output tidy_repair_string($output$config'utf8');
        
        
ob_start();
        echo 
$output;
    }
}
?>

Now that is pretty cool. It would've been cooler if Helper::afterLayout() was a working callback (it's defined in Helper, it's documented, just not implemented as a callback yet.. but it will be soon!)

Now that you've got the helper in place, all you need to do is add the appropriate entry into your AppController::$helpers array, if you want to apply it to absolutely all output.

If you still don't understand, try this:

Controller Class:

Download code <?php 
class AppController extends Controller {
    var 
$helpers = array('TidyFilter''...some of your other helpers...');
}
?>

Everything will now work automatically.

Happy baking!

 

Comments 684

CakePHP Team Comments Author Comments
 

Comment

1 Nice

Thanks for writing this article. I wanted to suggest also that if someone was adding this functionality to an application that they wanted to be more portable, perhaps it would be a good idea to add something along the lines of if(class_exists('Tidy')), to avoid notices if Tidy wasn't installed/enabled in a target environment.
Posted May 21, 2008 by Tim MacAleese aka Sliv
 

Comment

2 A Modified Approach

Here's a modified approach to your helper that I'm trying out:

http://bin.cakephp.org/view/744338418
Posted May 21, 2008 by Tim MacAleese aka Sliv
 

Comment

3 CakePHP limitation

@kuja: nice idea! Instead of doing the work in the __destruct you may wish to do it on the afterRender() method of the helper.
That would be the ideal location to stick the code, but unfortunately due to Cake's output buffering, this was the only way to do it!
Posted May 23, 2008 by Matthew Harris
 

Comment

4 Can we Cache this

Here's a modified approach to your helper that I'm trying out: http://bin.cakephp.org/view/744338418
Very nice Sliv, thanks for exposing the additional config that Tidy offers (I had not heard of this extension before).

Would there be any way of caching this process as I can imagine it's a pretty CPU intensive process.
Posted May 27, 2008 by Jonny Reeves
 

Comment

5 Very nice

This is def very cool and simple! Thanks!

In my current project I don't see much of a performance decrease at all.
Posted Jun 3, 2008 by Chey
 

Comment

6 Problem with non html content

I tested Sliv's modified helper. It worked fine until I output e.g. RSS.
Is there any way to disable tidy filtering on non-html output? My first guess was to utilize the Request-Handler but it's only available at Controller Level :(
Posted Aug 10, 2008 by David Persson
 

Comment

7 Tidy Filter and RSS nonHTML content

I tested Sliv's modified helper. It worked fine until I output e.g. RSS.
Is there any way to disable tidy filtering on non-html output? My first guess was to utilize the Request-Handler but it's only available at Controller Level :(

I had the same problem and here is the solution to getting TidyFilter working with RSS feeds.

In the beginning of the __destruct() function in helpers/tidy_filter.php, use $_SERVER['REQUEST_URI'] to see if the requested URL is an RSS feed. I used strpos() to find '.rss' in the URL, but you can also use ereg() or a similar function. Then you can use an if statement to proceed or not. See code below.

I only wrote the code to deal with .rss extensions, but you can extend the code to make it deal with more.

Helper Class:

<?php 
class TidyFilterHelper extends AppHelper {
    var 
$is_rss;
    
    function 
__construct() {   
        
$this->is_rss strpos($_SERVER['REQUEST_URI'], '.rss');
        
        if (
class_exists('Tidy') && !$this->is_rss) {
            
ob_start();
        }
    }
    
    function 
__destruct() {
        if (
class_exists('Tidy') && !$this->is_rss) {
            
$output ob_get_clean();
            
$Tidy = new Tidy();
            
$tidyConfig = array(
                
'doctype' => 'strict',
                
'drop-empty-paras' => true,
                
'drop-font-tags' => true,
                
'drop-proprietary-attributes' => true,
                
'enclose-block-text' => true,
                
'enclose-text' => true,
                
'hide-comments' => false,
                
'hide-endtags' => true,
                
'indent' => true,
                
'indent-spaces' => 4,
                
'logical-emphasis' => true,
                
'lower-literals' => true,
                
'markup' => true,
                
'output-xhtml' => true,
                
'quote-ampersand' => true,
                
'quote-marks' => true,
                
'quote-nbsp' => true,
                
'show-warnings' => false,
                
'wrap' => 250
            
);
            
$Tidy->parseString($output$tidyConfig'utf8');
            
$Tidy->diagnose();
            
$Tidy->cleanRepair();
            
$output tidy_get_output($Tidy);
            
$error_reporting false;
            if (
$Tidy->errorBuffer && Configure::read('debug') > && $error_reporting)  {
                
$output str_replace('</body>''<pre id="tidy">' htmlspecialchars($Tidy->errorBuffer) . "</pre>\n</body>"$output);
            }
            
ob_start();
            echo 
$output;
        }
    }
}
?>
Posted Oct 19, 2008 by Rahil Sondhi
 

Question

8 Problem with some exotic characters

I tried both the original method and Rahil's, but I am having problems with many punctuation and special characters (euro, curly quotes, m dash etc.). does not matter whether I type the escape code (ASCII no. or alphabetic) or directly the special character in the source.
i.e:
€ show as €
— shows as —
‘ shows as ‘
and so on.

Did I miss/misconfigure something?

gm
Posted Apr 6, 2009 by gattu marrudu