Tidy Output Filtering
If you'd like to filter all output from your application (layouts and views) through Tidy for cleaner markup, then this article is for you!
I recently stumbled across an urge to output clean HTML/XHTML on all my pages without having to worry much about writing the actual markup. Being able to do this would allow me to generate clean markup in all cases (at the cost of performance, of course).
Now, before starting, please note that you will need the php-tidy extension. You can read more at http://php.net/tidy.
Now, onto the good stuff. The solution I came to (thanks to AD7six) was creating a helper. I called this the TidyFilterHelper (I thought that TidyHelper may have been a bit misleading in this case).
You simply create the tidy_filter.php helper file, drop it in APP/views/helpers/.
The code is as follows:
Now that is pretty cool. It would've been cooler if Helper::afterLayout() was a working callback (it's defined in Helper, it's documented, just not implemented as a callback yet.. but it will be soon!)
Now that you've got the helper in place, all you need to do is add the appropriate entry into your AppController::$helpers array, if you want to apply it to absolutely all output.
If you still don't understand, try this:
Everything will now work automatically.
Happy baking!
Now, before starting, please note that you will need the php-tidy extension. You can read more at http://php.net/tidy.
Now, onto the good stuff. The solution I came to (thanks to AD7six) was creating a helper. I called this the TidyFilterHelper (I thought that TidyHelper may have been a bit misleading in this case).
You simply create the tidy_filter.php helper file, drop it in APP/views/helpers/.
The code is as follows:
Helper Class:
<?php
class TidyFilterHelper extends AppHelper {
function __construct()
{
ob_start();
}
function __destruct()
{
$output = ob_get_clean();
$config = array('indent' => true, 'output-xhtml' => true);
$output = tidy_repair_string($output, $config, 'utf8');
ob_start();
echo $output;
}
}
?>
Now that is pretty cool. It would've been cooler if Helper::afterLayout() was a working callback (it's defined in Helper, it's documented, just not implemented as a callback yet.. but it will be soon!)
Now that you've got the helper in place, all you need to do is add the appropriate entry into your AppController::$helpers array, if you want to apply it to absolutely all output.
If you still don't understand, try this:
Controller Class:
<?php
class AppController extends Controller {
var $helpers = array('TidyFilter', '...some of your other helpers...');
}
?>
Everything will now work automatically.
Happy baking!

var $helpers = array("Tidy","Next Helpers");
Helper Class:
<?php
class TidyHelper extends AppHelper {
function __construct() {
if (function_exists("tidy_repair_string")) {
ob_start();
}
}
function __destruct() {
if (function_exists("tidy_repair_string")) {
$output = ob_get_clean();
$config = array('indent' => true, 'output-xhtml' => true);
$output = tidy_repair_string($output, $config, "utf-8");
ob_start();
echo $output;
}
}
}
?>
I think I might just end up writing my own routine for clean up (it will assume valid xhtml, and just indent basically). Thanks for the article though, this could come in handy at a time when HTML Tidy isn't such a nuissance, and it gave insight into the workings of helper functions.
i.e:
€ show as €
— shows as â€â€
‘ shows as ‘
and so on.
Did I miss/misconfigure something?
gm
Is there any way to disable tidy filtering on non-html output? My first guess was to utilize the Request-Handler but it's only available at Controller Level :(
I had the same problem and here is the solution to getting TidyFilter working with RSS feeds.
In the beginning of the __destruct() function in helpers/tidy_filter.php, use $_SERVER['REQUEST_URI'] to see if the requested URL is an RSS feed. I used strpos() to find '.rss' in the URL, but you can also use ereg() or a similar function. Then you can use an if statement to proceed or not. See code below.
I only wrote the code to deal with .rss extensions, but you can extend the code to make it deal with more.
Helper Class:
<?php
class TidyFilterHelper extends AppHelper {
var $is_rss;
function __construct() {
$this->is_rss = strpos($_SERVER['REQUEST_URI'], '.rss');
if (class_exists('Tidy') && !$this->is_rss) {
ob_start();
}
}
function __destruct() {
if (class_exists('Tidy') && !$this->is_rss) {
$output = ob_get_clean();
$Tidy = new Tidy();
$tidyConfig = array(
'doctype' => 'strict',
'drop-empty-paras' => true,
'drop-font-tags' => true,
'drop-proprietary-attributes' => true,
'enclose-block-text' => true,
'enclose-text' => true,
'hide-comments' => false,
'hide-endtags' => true,
'indent' => true,
'indent-spaces' => 4,
'logical-emphasis' => true,
'lower-literals' => true,
'markup' => true,
'output-xhtml' => true,
'quote-ampersand' => true,
'quote-marks' => true,
'quote-nbsp' => true,
'show-warnings' => false,
'wrap' => 250
);
$Tidy->parseString($output, $tidyConfig, 'utf8');
$Tidy->diagnose();
$Tidy->cleanRepair();
$output = tidy_get_output($Tidy);
$error_reporting = false;
if ($Tidy->errorBuffer && Configure::read('debug') > 0 && $error_reporting) {
$output = str_replace('</body>', '<pre id="tidy">' . htmlspecialchars($Tidy->errorBuffer) . "</pre>\n</body>", $output);
}
ob_start();
echo $output;
}
}
}
?>
In my current project I don't see much of a performance decrease at all.
Very nice Sliv, thanks for exposing the additional config that Tidy offers (I had not heard of this extension before).
Would there be any way of caching this process as I can imagine it's a pretty CPU intensive process.
http://bin.cakephp.org/view/744338418
Please don't take this as an offense, just thought you might want to know that ;-)
That would be the ideal location to stick the code, but unfortunately due to Cake's output buffering, this was the only way to do it!