Tidy Output Filtering

by kuja
If you'd like to filter all output from your application (layouts and views) through Tidy for cleaner markup, then this article is for you!
I recently stumbled across an urge to output clean HTML/XHTML on all my pages without having to worry much about writing the actual markup. Being able to do this would allow me to generate clean markup in all cases (at the cost of performance, of course).

Now, before starting, please note that you will need the php-tidy extension. You can read more at http://php.net/tidy.

Now, onto the good stuff. The solution I came to (thanks to AD7six) was creating a helper. I called this the TidyFilterHelper (I thought that TidyHelper may have been a bit misleading in this case).

You simply create the tidy_filter.php helper file, drop it in APP/views/helpers/.

The code is as follows:

Helper Class:

<?php 
class TidyFilterHelper extends AppHelper {
    function 
__construct()
    {
        
ob_start();
    }
    
    function 
__destruct()
    {
        
$output ob_get_clean();
        
$config = array('indent' => true'output-xhtml' => true);
        
$output tidy_repair_string($output$config'utf8');
        
        
ob_start();
        echo 
$output;
    }
}
?>

Now that is pretty cool. It would've been cooler if Helper::afterLayout() was a working callback (it's defined in Helper, it's documented, just not implemented as a callback yet.. but it will be soon!)

Now that you've got the helper in place, all you need to do is add the appropriate entry into your AppController::$helpers array, if you want to apply it to absolutely all output.

If you still don't understand, try this:

Controller Class:

<?php 
class AppController extends Controller {
    var 
$helpers = array('TidyFilter''...some of your other helpers...');
}
?>

Everything will now work automatically.

Happy baking!

Report

More on Tutorials

Tags

Advertising

Comments

  • SMarek posted on 08/02/10 04:47:34 PM
    $helpers in app/app_controller.php

    var $helpers = array("Tidy","Next Helpers");
    app/views/helpers/tidy.php

    Helper Class:

    <?php 
    class TidyHelper extends AppHelper {

        function 
    __construct() {
            if (
    function_exists("tidy_repair_string")) {
                
    ob_start();
            }
        }

        function 
    __destruct() {
            if (
    function_exists("tidy_repair_string")) {
                
    $output ob_get_clean();
                
    $config = array('indent' => true'output-xhtml' => true);
                
    $output tidy_repair_string($output$config"utf-8");

                
    ob_start();
                echo 
    $output;
            }
        }

    }
    ?>
  • stkmtd posted on 10/23/09 08:09:46 PM
    I thought this would solve my messy-code woes, but as wonderful as HTML tidy is, it leaves much to be desired. Case-in-point: I have some empty divs used for styling purposes, HTML Tidy decides to strip them out with no option available to prevent it.

    I think I might just end up writing my own routine for clean up (it will assume valid xhtml, and just indent basically). Thanks for the article though, this could come in handy at a time when HTML Tidy isn't such a nuissance, and it gave insight into the workings of helper functions.
  • gattumarrudu posted on 04/06/09 12:32:29 PM
    I tried both the original method and Rahil's, but I am having problems with many punctuation and special characters (euro, curly quotes, m dash etc.). does not matter whether I type the escape code (ASCII no. or alphabetic) or directly the special character in the source.
    i.e:
    € show as €
    — shows as —
    ‘ shows as ‘
    and so on.

    Did I miss/misconfigure something?

    gm
  • davidpersson_ posted on 08/10/08 06:57:34 AM
    I tested Sliv's modified helper. It worked fine until I output e.g. RSS.
    Is there any way to disable tidy filtering on non-html output? My first guess was to utilize the Request-Handler but it's only available at Controller Level :(
    • rahil posted on 10/19/08 04:44:29 PM
      I tested Sliv's modified helper. It worked fine until I output e.g. RSS.
      Is there any way to disable tidy filtering on non-html output? My first guess was to utilize the Request-Handler but it's only available at Controller Level :(

      I had the same problem and here is the solution to getting TidyFilter working with RSS feeds.

      In the beginning of the __destruct() function in helpers/tidy_filter.php, use $_SERVER['REQUEST_URI'] to see if the requested URL is an RSS feed. I used strpos() to find '.rss' in the URL, but you can also use ereg() or a similar function. Then you can use an if statement to proceed or not. See code below.

      I only wrote the code to deal with .rss extensions, but you can extend the code to make it deal with more.

      Helper Class:

      <?php 
      class TidyFilterHelper extends AppHelper {
          var 
      $is_rss;
          
          function 
      __construct() {   
              
      $this->is_rss strpos($_SERVER['REQUEST_URI'], '.rss');
              
              if (
      class_exists('Tidy') && !$this->is_rss) {
                  
      ob_start();
              }
          }
          
          function 
      __destruct() {
              if (
      class_exists('Tidy') && !$this->is_rss) {
                  
      $output ob_get_clean();
                  
      $Tidy = new Tidy();
                  
      $tidyConfig = array(
                      
      'doctype' => 'strict',
                      
      'drop-empty-paras' => true,
                      
      'drop-font-tags' => true,
                      
      'drop-proprietary-attributes' => true,
                      
      'enclose-block-text' => true,
                      
      'enclose-text' => true,
                      
      'hide-comments' => false,
                      
      'hide-endtags' => true,
                      
      'indent' => true,
                      
      'indent-spaces' => 4,
                      
      'logical-emphasis' => true,
                      
      'lower-literals' => true,
                      
      'markup' => true,
                      
      'output-xhtml' => true,
                      
      'quote-ampersand' => true,
                      
      'quote-marks' => true,
                      
      'quote-nbsp' => true,
                      
      'show-warnings' => false,
                      
      'wrap' => 250
                  
      );
                  
      $Tidy->parseString($output$tidyConfig'utf8');
                  
      $Tidy->diagnose();
                  
      $Tidy->cleanRepair();
                  
      $output tidy_get_output($Tidy);
                  
      $error_reporting false;
                  if (
      $Tidy->errorBuffer && Configure::read('debug') > && $error_reporting)  {
                      
      $output str_replace('</body>''<pre id="tidy">' htmlspecialchars($Tidy->errorBuffer) . "</pre>\n</body>"$output);
                  }
                  
      ob_start();
                  echo 
      $output;
              }
          }
      }
      ?>
  • chey posted on 06/03/08 11:19:54 AM
    This is def very cool and simple! Thanks!

    In my current project I don't see much of a performance decrease at all.
  • jreeves posted on 05/27/08 04:26:28 AM
    Here's a modified approach to your helper that I'm trying out: http://bin.cakephp.org/view/744338418
    Very nice Sliv, thanks for exposing the additional config that Tidy offers (I had not heard of this extension before).

    Would there be any way of caching this process as I can imagine it's a pretty CPU intensive process.
  • Sliv posted on 05/21/08 10:55:42 AM
    Here's a modified approach to your helper that I'm trying out:

    http://bin.cakephp.org/view/744338418
  • Sliv posted on 05/21/08 10:30:37 AM
    Thanks for writing this article. I wanted to suggest also that if someone was adding this functionality to an application that they wanted to be more portable, perhaps it would be a good idea to add something along the lines of if(class_exists('Tidy')), to avoid notices if Tidy wasn't installed/enabled in a target environment.
  • lecterror posted on 05/19/08 08:18:30 AM
    "kuja" means "bitch" in Croatian language (and several other).

    Please don't take this as an offense, just thought you might want to know that ;-)
  • mariano posted on 05/13/08 09:43:29 AM
    @kuja: nice idea! Instead of doing the work in the __destruct you may wish to do it on the afterRender() method of the helper.
    • kuja posted on 05/23/08 02:00:26 AM
      @kuja: nice idea! Instead of doing the work in the __destruct you may wish to do it on the afterRender() method of the helper.
      That would be the ideal location to stick the code, but unfortunately due to Cake's output buffering, this was the only way to do it!
login to post a comment.