URL Rewritting
Posted: 2009-04-19 00:00:00
Link: http://httpd.apache.org/docs/1.3/mod/mod_rewrite.html
I was thinking about making a second version of this site, one that would be mainly for search engines (well, before I realized that I should just focus on getting the flash one done) and thought it would be a good opportunity to try out URL Rewritting.
I knew that my hosting provider uses Apache, as most do, and knew that I could use the htaccess file to do the work for me. There is a function on Apache called 'mod_rewrite' that is pretty verbose - I haven't really gotten in to half of what it can do. However, here's what I found to be useful.
First, what is URL Rewritting? Well, put simply, it's the ability to turn a url such as http://mysite.com/test.php?var1=1&var2=2 into something such as http://mysite.com/test/1/2 or http://mysite.com/test-1-2 or http://mysite.com/test.htm - really, whatever you want the url to be for your page, the rewritting engine can do.
To get started, you need to put these two lines in the top of your .htaccess file:
Options +FollowSymlinks
RewriteEngine on
The first, the "Options" one is needed for any of the rewrites to work, just a security requirement of Apache. The second, the "RewriteEngine on" tells Apache that we are going to be doing some rewritting, so turn on that functionality.
From here, we are free to do whatever we want (mwhahaha). Mod_rewrite uses regular expressions, so if you are a bit rusty, you might want to brush up a bit before trying to decipher what is going on below.
Let's say that I have a file on my root directory called "blog.php", and that php file takes one argument, "blogid". My url to that file would look something like: http://site/blog.php?blogid=10. Though some search engines are getting better at these dynamic urls, it still would be better to point a user (or engine) at something like: http://site/blogs/10. So let's do that. The line you would want to add to your htaccess file would be:
RewriteRule ^blogs/([^/]*) blog.php?blogid=$1 [NC,L]
Ok, so what the heck is all of that? First, you need RewriteRule to specify that you are using a rule. Second, the ^blogs/([^/]*) is saying (in regular expressions) that you want to match any url that starts with "blogs/" then any number of characters as long as the character isn't '/' (so 'blogs/10' would match). Finally the blog.php?blogid=$1 tells Apache where you want the url to rewrite to. The $1 is the first matched argument, which is the ([^/]*) part. If you have multiple arguments that you want, such as if your php file looked like "blog.php?type=personal&blogid=10", then you would make your rule look like:
RewriteRule ^blogs/([^/]*)/([^/]*) blog.php?type=$1&blogid=$2 [NC,L]
The other important part of creating a rule is the flags section. This the [NC,L] at the end of each of the rules above. The most important flags are NC, R, and L (at least to me), but there are others. NC means that you do not want to worry about case - so that's a case insensitive match. R is the redirect flag which allows you to specify that the page has been moved - great for when you are rewritting .php files to a .htm path and you want to make sure that people and engines update their bookmarks. By default, the R flag is a 302, Moved Temporarily, which is great for testing. However, when you are ready to deploy the site live, and after you check the links thoroughly, you should probably set this to [R=301], 301 being: Moved Permenently. Finally, and most importantly, L. While L means "last", which stops the processing of the current rule when encountered, what it really does is not update the user's address bar. Up to the point when you put the L flag in, if you do a rewrite from "http://site/blogs/10" to "http://site/blog.php?blogid=10", the user's web browser will update the nice, neat url that was entered to the not so nice "/blog.php?blogid=10". My guess is that this is not what you want, so make sure to put in the [L] flag, but just make sure to list it as the last flag.
Another thing that can be important is that if a user just goes to "http://site/blogs/", you might want to show the first 25 results or so - our current rewrite rules allow this: the blogid field in php just doesn't get populated, and in the php we can check for this just fine. What is a bit of a pain though is if a user goes to "http://site/blogs". This doesn't match our current rules - we require that the url is "/blogs/*" (note the trailing slash). A way around this is to make the change to something such as:
RewriteRule ^blogs(/*)([^/]*) blog.php?blogid=$2 [NC,L]
Notice that the parameter is now "$2", as the first match can come from the (/*) expression. What this is now saying is: match the url that is blogs possibly followed by a forward slash, then possibly followed by any number of characters that aren't a forward slash. You have to be a little bit careful here though, as if you are naming your rule "/blog/10" and your php file is "blog.php", the rewrite engine will kick in on your filename as well, and ".php" will end up being your parameter.
There is another way around the rewrite issue above, and that's the use of the rewrite condition. To use this, you would do something like:
RewriteCond %{REQUEST_URI} !blog.php
This is just saying to ignore the rules when the url being requested is blog.php. It's also useful when you want to make sure that certain subfolders aren't rewritten or certain files, such as css or javascript files. To block out an entire filetype, you can do:
RewriteCond %{REQUEST_URI} !\.css$
The dollar sign makes sure that ".css" is at the end of the filename.
The other way to use rewrite conditions is the setting of variables. Instead of having to match the $2 in the above (blog.php?blogid=$2), we could create a variable on the part of the url we want to match and then populate our query with that instead. To do this:
RewriteCond %{QUERY_STRING} ^blogs/([^/]*)
RewriteRule ^blogs(/*)([^/]*) blog.php?blogid=%1 [NC,L]
The %1 is the match for the rewrite condition that we specified.
There's a ton more, such as blocking ip addresses with the %{REMOTE_ADDR} condition, preventing referring sites and people hot linking to your content.
Back to the main page