URL Rewriter - Database Driven (C#)

What's URL Rewriting?
About 2 years ago, a customer asked me about implementing URL Rewriting into their website. I had never heard of such a thing, so I told them I’d look into it. In case you’ve never heard of it, here’s the basic explanation:

All webpages that display dynamic text usually have querystring values passed into them so they know what to display. The website you’re on now, brianpautsch.com is a good example of it.

For example, http://www.brianpautsch.com/BlogEntries.aspx?Y=2006&M=03 would tell the webpage to display all items from March, 2006. Try it out.

Now, go to http://www.brianpautsch.com/Blog/2006/3/ - Same results, huh?

Here’s what happened. Behind the scenes, I added a rewriter rule to say: change all URLs that are “ShowItems” + 4 numbers + 2 numbers + “.aspx” to “ShowItems.aspx?Year=” + 4 numbers + “&M=” + 2 numbers.

That’s it…sounds easy enough, right? Not really.

Fortunately, I came across an article by Scott Mitchell on MSDN:
So first off, understand that I did not write this code. I simply extended it to allow you to store your rules in the web.config or in a database. The web.config is a great location if the rules rarely change and only use regular expressions. In my case, I’ve had 6 or 7 customers who have new rules added every day. They basically create a rule for every article, blog entry, etc. they publish.

Why should I implement URL Rewriting?

  • Some public search engines and most site and intranet search engines will index URLs with dynamic URLS, but others will not. And because it's difficult to link to these pages, they will be penalized by engines which use link analysis to improve relevance ranking, such as Google's PageRank algorithm. In Summer 2003, Google had very few dynamic URL pages in the first 10 pages of results for test searches.
  • Search engine robot writers are concerned about their robot programs getting lost on web sites with infinite listings. Search engine developers call these "spider traps" or "black holes" -- sites where each page has links to many more programmatically-generated pages, without any useful content on them. The classic example is a calendar that keeps going forward through the 21st Century, although it has no events set after this year. This can cause the search engine to waste time and effort, or even crash your server.
  • Readable URLs are good for more than being found by local and webwide search engine robots. Humans feel more comfortable with consistent and intuitive paths, recognizing the date or product name in the URL.
  • By abstracting the public version of the URL, it will not be dependent on the backend software. If your site changes from Perl to Java or from CFM to .Net, the URLs will not change, so all links to your pages will remain live.

Source: http://www.searchtools.com/robots/goodurls.html

So how do I do it?
I will not be going into the details behind the URL Rewriter as Scott Mitchell’s article on MSDN explains it in great detail. But I highly recommend you read the article…it’s very good. This article is for those of you who want to implement it in 10 minutes and be done…and then maybe learn about it later.

Download Code
Download Binaries only

1. Update your web.config
In order for the Rewriter code to execute, you must add an HTTP module entry into the web.config. You can also use an HTTP handler (see MSDN article for explanation).

2. Setup your rules
For URL Rewriting to work, you need a “LookFor” value (the URL being sent over in the browser) and a “SendTo” value (the URL to rewrite to). In the web.config, it’s very easy. First, add a reference to your rules configuration section:

then add your rules:

In the database, it’s also very easy, but requires a little work. First, create a table:

Then, write a stored procedure to retrieve all Rewriter Rules:

CREATE PROCEDURE dbo.spRewriterRule_GetAll


SELECT * FROM RewriterRules

Be sure to update the "RewriterUseDB" setting in the web.config based on your decision to use the web.config or database.

A few points about Rules:

  • The “~” is a built-in feature that means “go to the root level of the application”.
  • All entries must be XML compliant. So be sure to escape periods (\.) and excode ampersands (&).
  • Be sure to take the time to design your rules and folder hierarchy. On brianpautsch.com, I currently have the Rule “~/ShowItems(\d{4})/(\d{2})/(\d{2})\.aspx “to go “ShowItems.aspx? Year=$1&Month=$2&Day=$3”. I think I’m going to change it to “LookFor” “~/(\d{4})/(\d{2})/(\d{2})/” so people can type in bp.com/2006/03/01/ to get the entries they’re looking for. Both LookFor’s are intuitive, but the latter is much better.

Note: Rewriter rules that have folders in them must have the folder exist! And in that folder must be a default page with a simple “<%@ Page %>” tag.

2. Reference the RewriterDB and ActionlessForm DLLs
I recommend you add the projects to your solution and then add references. Then when you run the code, you can step through the Rewriter code.

3. Create your webpages
Based upon your rules, create the necessary webpages to support them. Be sure to add the ActionlessForm “form” tag at the top of each webpage:

And then add a few links to your webpage to test your rules out.

This allows the page to post back to itself without revealing the true path.

A few words on URL Rewriting…
I have implemented this into many websites and it works great. A couple websites have several thousand rewriter rules and there is no sign of performance loss.

  • When possible, use regular expressions over exact matches.
  • When using the database approach, be sure to associate each rewrite rule with the information it is rewriting. For example, if each blog entry has its own rewrite rule, you’ll need to add a column titled “RuleID” to the BlogEntries table. This is necessary so that you can JOIN on the RewriterRules table when displaying the blog entries and also easily find the rule when updating/deleting blog entries.
  • When using the database approach and your “LookFor” has subfolders, be sure you create these! In the code provided, I simply created a folder titled “defaultpage” and whenever I create a rule, I ensure the folder exists and the file exists. Here’s some code I have written for live applications:

1//Ensure rewriter path exists2DateTime dteCreatedOn = objDataSvc.CreatedOn;
4"Blogs\\" + String.Format("{0:yyyy}", dteCreatedOn) + "\\" + 
5 String.Format("{0:MM}", dteCreatedOn) + "\\" + 
6 String.Format("{0:dd}", dteCreatedOn) + "\\" + 
7 txtPageName.Text.Trim() + "\\Default.aspx"); 
8//Clear cached values so they reload on next page click from anyone9HttpContext.Current.Cache.Remove("RewriterConfig");
1publicstaticvoid EnsureDefaultFileExists(string strRoot, 
string strFilePath) 2{ 3//Directory missing4if (!Directory.Exists(Path.GetDirectoryName(strRoot + strFilePath))) 5 Directory.CreateDirectory(Path.GetDirectoryName(strRoot + strFilePath)); 67//File missing8if (!File.Exists(strRoot + strFilePath)) 9 { 10string strDefaultLoc = 11 ConfigurationSettings.AppSettings["RewriterDefaultPage"].ToString(); 12 File.Copy(strRoot + strDefaultLoc, strRoot + strFilePath, true); 13 } 14}