Cleaning Up After Migrating To Drupal
We have just finished a migration job for a client of ours from an old .Net system in to Drupal, the last task of which was to write some Apache mod_rewrite conditions and rules to deal with the URLs of their old website. This proved to be more trouble than I thought, mainly because I struggled to find examples of how this might work.
Firstly, the ground work. The URL pattern to be redirected looked like this:
MainArticle.aspx?m=33818&amid=30301119Where the amid value is the article ID which we had taken through to Drupal and used the Pathauto module to make sure all the URLs were:
story/%amidSo we now have a connection between the old URLs and the new URLs we can confidently rely on as the basis for a mechanism to make sure both bots and people are directed to the new content correctly.
Now for the tricky bit. To do the directing we decided to use Apache's built-in redirection abilities. I started out thinking we would use RedirectMatch, but I pretty quickly found this blog post - it was a good starting point, saying this would not work and saving some time:
http://davidherron.com/content/cure-fail-when-using-redirectmatch-clean-...
It also shows how you can achieve the same with RewriteRule instead, to provide a 301 redirect (permanently moved), but it only tells half the story. I'd perfected my rule and it looked like this:
RewriteRule ^.*amid=([0-9]+).*$ story/$1 [R=301,L]I tested it here and it worked too:
http://civilolydnad.se/projects/rewriterule/
But no matter where I put it in Drupal's .htaccess file, it did not do a thing!
Eventually I found this comment on Drupal Groups, which made the penny finally drop:
http://groups.drupal.org/node/11361#comment-43246
The rewrite rule tester I had been using has a bug! The querystring is being disregarded by the *real* mod_rewrite, so even though my rule seemed to work at the rewrite rule tester above, the querystring was simply not available. This is me not understanding things properly. Once I had that worked out, it was plain sailing.
So here's what I finally ended up with, with comments inline:
# Rewrite old-style URLs for .aspx scripts
# First we need to exclude files and favicon.ico, just like Drupal does
# Repeat the core conditions here:
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !=/favicon.ico
# Now we need to find the value we're after in the querystring:
RewriteCond %{QUERY_STRING} ^.*amid=([0-9]+).*$
# Finally, catch any request for an aspx script and push it to our alias
# Note the R=301, our permanent redirect
# Also note the final ? on the end, to stop the old querystring being
# passed on again:
RewriteRule ^.*\.aspx.*$ /story/%1? [R=301,L]This *does* work. It will catch all the requests to any aspx script and push them on to the revised URL on the same server, as a 301 redirect, using the value from the previous condition where we parsed the querystring to get the old amid article ID out.


You can do it all with a module too.
When I did this for HealthConnection.co.nz's migration from .net to Drupal I toyed with Apache ReWrites but went with a Drupal module instead. We stored the mappings of .net IDs to Drupal IDs then implemented a menu item for "Article.asp". In the callback function it looks for the old ID in the query string, uses it to retrieve the new ID and issues a drupal_goto() redirect to "node/$new_id". There were several different menu callbacks for different ASP files.
This might not be quite as scalable as an approach that offloads more of the work to path module and .htaccess, but was fast and easy to implement and debug.
Post new comment