Boost performance by removing .htaccess PLUS multi-site with VirtualDocumentRoot

By Ryan Pendergast (rynop)
This tutorial is intended for developers who:
1) Are creating a cake app that needs to scale
2) Use apache and have access to modify their apache config

Using apache .htaccess files is a huge performance hit - and should be avoided at all costs. The tutorial below will show you how to get "pretty url" features of cake, while not having to use .htaccess to do so. The first half of the article will explain how to get rid of .htaccess, while the second half will tie it into a more complex (but real world) example of how you can leverage this while using 1 apache config to serve multiple subdomains.

Background

In general, you should never use .htaccess files unless you don't have access to the main server configuration file --apache.org
First off a little background. Htaccess files are resource hogs. If enabled, on every request your disk is hit multiple times as apache looks for these files in every directory from your document root all the way up to where the content that is being requested actually lives. In addition, directives in .htaccess files found in this search have to be merged with the apache config directives in memory that overlap in the same scope. Apache has some nice examples and further explanation here: http://httpd.apache.org/docs/2.2/howto/htaccess.html#when

Part 1: say bye to .htaccess

OK so now that you understand the problem, lets solve it. Most things that can be done in an .htaccess file can be done in a Directory container. Lets get right to the solution, then I will explain. your apache config file:
Download code <VirtualHost *:80>
        ServerName www.leaguelogix.com
        ServerAlias www.leaguelogix.com leaguelogix.com
        DocumentRoot /var/www/leaguelogix/app/webroot

        Options -Indexes FollowSymLinks

    #disable htaccess starting at /
       <Directory />
                AllowOverride none
        </Directory>

       <Directory /var/www/leaguelogix/app/webroot/>
                RewriteEngine On
                RewriteBase /
                RewriteCond %{REQUEST_FILENAME} !-d
                RewriteCond %{REQUEST_FILENAME} !-f
                RewriteRule ^(.*)$ index.php?url=$1 [QSA,L]

                <Files sitemap.xml>
                        RewriteEngine Off
                </Files>
        </Directory>
</VirtualHost>

First disable searching and parsing of htaccess by doing
Download code <Directory />
    AllowOverride none
</Directory>
This will prevent apache from looking for it anywhere throughout the request. You can delete .htaccess files from your cake project at this point - but you don't have to.

Setup your DocumentRoot to point to your cake app's webroot dir.

Then setup the rewrite rules.
Download code                 
                RewriteEngine On
                RewriteBase /
                RewriteCond %{REQUEST_FILENAME} !-d
                RewriteCond %{REQUEST_FILENAME} !-f
                RewriteRule ^(.*)$ index.php?url=$1 [QSA,L]
I don't want to get into the power and complexity of mod_rewrite in this article, so I'll briefly summarize. These rules will take requests that are not for files or directores, and internally re-direct the request to the index.php file in the cake webroot directory. This facilitates the "magic" of mapping pretty URL's (named after your controller for example) into the cake framework.

Now restart apache (sudo /etc/init.d/apache2 restart) and you should be off and running.

Part 2: A real world example

So that was easy right? Now for a real life example where this gets more complex (but very helpful).

If you were curious, and went to leaguelogix.com, you'll notice its not a cake app at all. It's just a wp blog that I setup in ~ 30mins. My cake app (and my real product) is a web app platform that drives many sites. My startup provides tooling that allows one person to easily manage and run many sports leagues - and each customer gets their own website (and domain).

I wanted to have a way where potential customers could quickly create a site for free, to test out my product. So instead of having a website, they could just get a mysite.leaguelogix.com subdomain. I wanted to keep avoiding .htaccess, as well as not having to make a separate virtual host for each subdomain. The cakephp platform (my app) is common among all sites - there is just a bit of config and look and feel that is unique to a site.

Here is where Apache's VirtualDocumentRoot and the topic discussed in Part 1 pays off. Again, right to the code, then an explanation.
Download code <VirtualHost *:80>
    #this handles sitename.leaguelogix.com
    ServerName leaguelogix.com
    ServerAlias *.leaguelogix.com
             
        Options -Indexes FollowSymLinks

    UseCanonicalName Off
    VirtualDocumentRoot /opt/leagues/sites/%1/app/webroot

    <Directory />
            AllowOverride none
    </Directory>

    <Directory "/opt/leagues/sites/*/app/webroot/">
        RewriteEngine On
        RewriteBase /
        RewriteCond %{REQUEST_FILENAME} !-d
        RewriteCond %{REQUEST_FILENAME} !-f
        RewriteRule ^/opt/leagues/sites/(.*)/app/webroot/(.*)$ index.php?url=$2 [QSA,L]

        <Files sitemap.xml>
            RewriteEngine Off
        </Files>
    </Directory>
</VirtualHost>

So I'll discuss the delta's from Part 1.
Download code UseCanonicalName Off
VirtualDocumentRoot /opt/leagues/sites/%1/app/webroot
This allows you to use one apache config to serve many document roots based on the domain that is entered. EX: a request for http://mysite.leaguelogix.com will set the document root to /opt/leagues/sites/mysite/app/webroot. Cool eh? So no need to make new apache vhost and reload the config every time i get a new "trial" customer.

Now the tricky part - loading mod_vhost_alias and using VirtualDocumentRoot throws a wrench into our simple mod_rewrite directives in Part 1. It now sends the entire fully qualified path to RewriteRule. Cakes index.php?url does not work with this. The following line takes care of that:
Download code RewriteRule ^/opt/leagues/sites/(.*)/app/webroot/(.*)$ index.php?url=$2 [QSA,L] This will "strip off" the webroot, and pass the url that cake needs ($2).

thats it - good luck and hope this helped.

NOTE: While I have done some testing on this - I should note that I have not put this into production yet. Please feel free to post comments on potential pitfalls this approach might have. I will try to respond to comments/questions the best I can...

Interested in squeezing more performance out of Cake? Check out a nice article from pseudocoder here: http://www.pseudocoder.com/archives/2009/03/17/8-ways-to-speed-up-cakephp-apps/

 

Comments 1317

CakePHP Team Comments Author Comments
 

Comment

1 hi Ryan

excellent work, ill try it right away..
and in future it would be great to see a benchmark with and without .htaccess
Posted Nov 5, 2009 by Gediminas Morkevicius
 

Comment

2 Untitled 1

@Ryan
I like this. Definitely something I will probably use in the future.

@Gediminas
There's probably a decent benchmark on the web somewhere. I think the general argument of writing rules into the vhost file is that you get a performance boost b/c the rules are loaded into memory when Apache starts up. .htaccess files are read and parsed each time the site/directory is accessed. I'm not an Apache guru, so I could be wrong.
Posted Nov 5, 2009 by Cameron Perry
 

Comment

3 benchmark

This is not a direct answer to your question, but hang with me....

It's been my experience that disk and memory has always been my bottleneck when scaling. Its easy to fix the memory bottle necks by adding more memory or making code improvements that consume less memory per-request. Its not so easy to cut down on disk I/O - and furthermore in today's shift towards virtual machines - even if you don't realize it, disk i/o is constrained MUCH more then ever before. Throw a disk backed DB in there and u get in trouble fast.

So my point is, even if the boost is minute in benchmarks that exist out there (or benchmarks you do on your own box) - when you go VM or scale, it shows up quickly. Any thing that you can do to cut down i/o is huge (IMO of course).

If you have control over your apache config, using htaccess is silly. Not using it will cut down on disk I/O and, as killa' Cameron states, its all in memory.

Which makes me think - I'm gonna write another article about using memcached with sessions... I'll write that up quick tonight and submit it...

Again my disclaimer: I have not yet put this technique into my production site - I'm still a bit worried that there is some strange combo out there that I'm missing - mod_rewrite is so complex. If anyone can shoot holes in this config please do... (you'll be helping me out too)
@Ryan
I like this. Definitely something I will probably use in the future.

@Gediminas
There's probably a decent benchmark on the web somewhere. I think the general argument of writing rules into the vhost file is that you get a performance boost b/c the rules are loaded into memory when Apache starts up. .htaccess files are read and parsed each time the site/directory is accessed. I'm not an Apache guru, so I could be wrong.
Posted Nov 5, 2009 by Ryan Pendergast
 

Comment

4 Enviroment vars remarks

Great article.

While my setup isn't quite the same as yours (I put my rewrite rules outside the Directory directive in a include), I've ran into a few problems:
Cake's idea of webroot gets messed up. Please check the $_SERVER variables in PHP, I've had a few problems with those.

Do you use per-site databases? How do you handle directory and site dir creation?
Posted Nov 6, 2009 by Roy van der Veen
 

Comment

5 thank you

Awesome !

I'm going to try that :)
Can't wait for your next article about memcache !
Posted Nov 8, 2009 by Olivier
 

Comment

6 Re: env var remarks

Without seeing your apache config, I can't give you an exact reason why things are working for you - but I can tell you that putting the rewrite rules outside of the container will not work.

Those rewrite rules are specific for the cake app/webroot dir. I suspect thats what you are seeing strange behavior.

A good exercise would be to inspect the 3 .htaccess files that ship with the cake distribution. there is one in the root, one in app and one in app/webroot. As you'll notice, they are different. My example assumes you point your apache documentRoot to app/webroot dir. My example/architecture will give you the best performance possible if you are going to use rewrite...

I use one DB. If I ever get to a point where I really need my db to scale I'm gonna use Amazon RDS [link]http://aws.amazon.com/rds/[/link] <-- check that guy out - its sweet.

As for my site creation, its pretty complex and I could write an entire article about it. I'll give you a hint tho - I utilize the fact that you can have 1 code base with many apps pointing to it and/or overriding the behavior. Checkout the docs on config/bootstrap.php - VERY powerful stuff.
Great article.

While my setup isn't quite the same as yours (I put my rewrite rules outside the Directory directive in a include), I've ran into a few problems:
Cake's idea of webroot gets messed up. Please check the $_SERVER variables in PHP, I've had a few problems with those.

Do you use per-site databases? How do you handle directory and site dir creation?
Posted Nov 10, 2009 by Ryan Pendergast
 

Comment

7 When not using the /app/webroot layout...

My site lives in a folder e.g. var/www/website/ that acts as the document root.

In this instance, you need to use two directory directives (directly copied from the Cake .htaccess files contained therein.

For the short of patience, here is a truncated version of my file:

<VirtualHost *:80>
    DocumentRoot "/var/www/app_name/doc_root_dir/"
    ServerName app_name

   Options -Indexes FollowSymLinks

   #disable htaccess starting at /
   <Directory />
      AllowOverride none
   </Directory>
   <Directory /var/www/app_name/doc_root_dir/webroot/>
      RewriteEngine On
      RewriteBase /
      RewriteCond %{REQUEST_FILENAME} !-d
      RewriteCond %{REQUEST_FILENAME} !-f
      RewriteRule ^(.*)$ index.php?url=$1 [QSA,L]
   </Directory>
   <Directory /var/www/app_name/doc_root_dir/>
       RewriteEngine on
       RewriteBase /
       RewriteCond %{REQUEST_FILENAME} !-f
       RewriteCond %{REQUEST_FILENAME} !-d
       RewriteRule    ^$    webroot/
       RewriteCond %{REQUEST_FILENAME} !-f
       RewriteCond %{REQUEST_FILENAME} !-d
       RewriteRule    (.*) webroot/$1
   </Directory>
</VirtualHost>
Posted Nov 18, 2009 by Dave Jones
 

Comment

8 Something else to consider

If you've ever thought about using another server as a reverse proxy in front of Apache, you can achieve a similar benefit. I may write full article about this, but the basic idea is nginx serving static files and everything else hitting apache.

So on a page request where you've got a page with 2 CSS files, 3-5 JS files, maybe 10 images only the 1 initial page request actually makes it through to Apache. Simply means that the .htaccess file will only be hit once for that request instead of about 20 times for the initial request and all subsequent requests.

Suddenly, .htaccess isn't NEARLY as painful. I specifically like this approach if you can do it because it's generally easier to setup a code to automatically update an htaccess file than an Apache configuration file.

Using nginx as a reverse proxy has it's own challenges as well though and there's a lot of little nuances that you'll learn to adjust to over time. Once you know them, it's fairly minor though.
Posted Dec 12, 2009 by Barry
 

Comment

9 very useful

just for intranet application or dedicated servers where we have a write access to the httpd.conf
Posted Dec 13, 2009 by Hussein Harake
 

Comment

10 Domains

I'm a bit illiterate when it comes to DNS and domain pointing. This is great for subdomains, but how would I get a domain name to point to the path of one of the client sites?

Thanks for the article!
Posted Jan 6, 2010 by Steve Oliveira
 

Comment

11 single site

I'm a bit illiterate when it comes to DNS and domain pointing. This is great for subdomains, but how would I get a domain name to point to the path of one of the client sites?

Thanks for the article!

Hi Steve. Not sure I follow your question 100%, but maybe this will help. Here is the first thing I do (from an apache perspective) when I begin to develop all my single domain cake apps. This example is based on Ubuntu.

1. Make a vhost for my app:
<VirtualHost *:80>
        ServerName myapp.localhost.com
        ServerAlias www.myapp.localhost.com myapp.localhost.com
        DocumentRoot /var/www/myapp

        RewriteEngine On

        RewriteCond %{HTTP_HOST}   !^www\.myapp\.localhost\.com [NC]
        RewriteCond %{HTTP_HOST}   !^$
        RewriteRule ^/(.*)         http://www.myapp.localhost.com/$1 [L,R=permanent]

        <Directory /var/www/myapp>
                Options FollowSymLinks
                AllowOverride None
                RewriteEngine on
                RewriteCond %{REQUEST_FILENAME} !-d
                RewriteCond %{REQUEST_FILENAME} !-f
                RewriteRule ^(.*)$ index.php?url=$1 [QSA,L]
        </Directory>
</VirtualHost>

2. Make symlink from the webroot in my workspace dir to the distro specific default webroot
ln -s /home/ryan/myapp/app/webroot /var/www/myapp

3. Make an entry in /etc/hosts that points 127.0.0.1 to www.myapp.localhost.com
4. /etc/init.d/apache2 reload

So now you should be golden (just setup permissions correctly so webserver user can read ur files AND write to the app/tmp dir).

Modifying this for a real domain is simple, just change the apache directives to point to your domian and of course no need for the hosts entry. Hope this answers your question.
Posted Jan 18, 2010 by Ryan Pendergast
 

Comment

12 ed

Because a lot of them have seen the beleivable way to improve that after awhile that it starts to look like the solution this morning. car shipping Yes, it has be gone over the best way.
Posted Jul 22, 2010 by dan kaylee
 

Question

13 Database access and Permissions

Very good article Ryan, and thanks for sharing.
I have a question about database access from different domains/subdomains. How can domain X do CRUD actions on his data, when logged on to the admin interface, without affecting domain Y data.
I am thinking you are using Domains as the IDs for your where clause, but i am still fuzzy about it in the cake context.
Are you using something like the permissionable plugin, or using some restrictions on find and save when doing your database calls?
Are you using raw SQL, or the model functions?
Posted Jul 26, 2010 by Aziz B