July 15, 2011

4 Ideas To Improve IIS & .NET For Technical SEO

In June 2011, I spoke at SMX Advanced about SEO issues that I commonly run in to during technical SEO site evaluations. The part of my presentation that dealt with Microsoft’s Internet Information Server (IIS) generated a lot of comments and questions afterward, so this column addresses some of those questions about how to improve techncial SEO on the Microsoft stack.
First, a caveat: The majority of my experience has been with Linux- and BSD-based operating systems, starting with SunOS way back at Berkeley, so I’m definitely not an expert on deploying servers on Windows and/or .NET.
I’ve asked Microsoft-stack expert Colin Cochrane to correct anything Windows-related that I have stated incorrectly. (Thank you, Colin. Your link is in the mail.) Any remaining errors in this article are definitely mine, and not his.
After completing technical SEO assessments on numerous sites running on IIS and .NET, I believe that it is a very scalable and production-worthy platform, but I have found that its default settings are far from optimal from a technical SEO point of view.
This article describes the most common issues I’ve seen. Several of these issues cause canonicalization problems, as described in more detail in this article about Google’s parameter handling feature.
Oh, and here is a second caveat: Please be sure to test any changes on a staging server before rolling them out to production. I would hate for something to happen to your website because I made a typo or worded something unclearly.

1. Default Pages (Default.aspx)

The problem
Directory pages are available at two URLs, one with and one without the default page. For example, these two URLs would lead to the same page:
  • http://www.site.com/directory/
  • http://www.site.com/directory/Default.aspx
In this example, the default page is Default.aspx, though it could be configured to be a different name.
Why it is bad
  • Link diffusion. Inbound links to the page could point at either of these two URLs. It would be much better to focus the inbound links on only one URL.
  • Crawl inefficiency. Crawlers have to crawl two URLs to get one page for each directory on the site.
The usual way to deal with duplicate URLs like these is to permanently (with a 301) redirect one URL to the other. However, in this case, it will result in an infinite redirect loop.
The culprit
The reason that redirecting one URL to the other leads to a redirect loop is because both of these URLs look exactly the same to the .NET application. For directory URLs, the default page is always appended to it so the application can’t tell whether it should redirect the URL or not.
Fixing it
The easiest way to fix this is to put a link rel=canonical tag on these pages and point to whichever URL you want to be the canonical. It’s not as good as a permanent redirect, but it will work in a pinch if you don’t want to mess around with your server configuration.
A more permanent fix is to use a 3rd party URL rewriter, which will redirect the URL before it gets to the .NET application. Some URL rewriters I have seen used successfully on sites are URLRewrite (for IIS7 only), URLRewriter, and ISAPI Rewrite 2.

2. Case Insensitive URLs

The problem
The path part of the URLs served by IIS is case-insensitive. So any of these URLs will usually lead to the same page:
  • http://www.site.com/directory/default.aspx
  • http://www.site.com/Directory/Default.ASPX
  • http://www.site.com/DIRECTORY/DeFaUlT.aSpX
Why it is bad
  • Crawl inefficiency. Google and Bing will crawl all of the different case variations that it sees in links, even though they all lead to the same page.
  • Link diffusion. Inbound links could go to any of the variations of the same URL. I’ve even seen different capitalizations of URLs used in internal links within a website.
  • Robots.txt problems. Because the robots.txt file is case-sensitive, if your URLs aren’t crawlers may be accessing URLs that you thought were blocked.
The culprit
My guess is that it has something to do with the Windows path handling in general, which is also case-insensitive.
Some ideas for fixing it
Similar to the first issue, the easiest way to resolve this is to use a link rel=canonical tag that points to the URL with the correct capitalization.
The URL rewriters listed above are the best option for normalizing the case. They can be configured to permanently redirect a URL to the right capitaliziation. If you pick an easy method for canonicalizing URLs, like converting everything to lower case, it can be implemented with one general rule.
Here is an example rule that rewrites a URL to all lower case that will work with URLRewrite:
If you implement something like this keep in mind that some URLs may require upper case, such as the Bing authorization file BingSiteAuth.xml. URLs like these need to be added to the rule as exceptions.
Here is a post containing 10 very useful rewriting rules, one of which converts URLs to lowercase.

3. Handling Page Not Found Errors & Internal Server Errors

The problem
In its default configuration, ASP.NET handles errors (like page not found or internal server problems) by redirecting with a 302 temporary redirect to an error page, which usually returns a 200 response.
Why it’s bad
  • Crawl inefficiency. Because a 302 redirect is a temporary redirect, search engines will continue to check that URL often in hopes of one day getting a page at that URL instead of a redirect. And if the target page returns a 200 response, then the search engines will index the initial URL, which means your site might start ranking with URLs that lead searchers to error pages.
This means that pages that are removed from the site or pages that throw an error will get continue to be crawled as if they were regular pages. This means that the crawler is spending time on these URLs instead of on actual pages with useful content.
And because the page not found page gets so much traffic and has so many URLs pointing to it, they tend to get crawled pretty frequently, which further reduces crawl efficiency.
  • “Non-graceful” site failure. If your site starts returning an error — due to a temporary database problem, for example — large portions of your site could get de-duplicated out of the index because they are suddenly redirecting to the same URL.
The culprit
This is the default behavior in ASP.NET.
Some ideas for fixing it
Fortunately, this issue has a fix that is pretty straight forward and requires a minor change to the web.config file.
Here is part of an example web.config file that prevents these redirects:
The attribute redirectMode needs to be set to ResponseRewrite instead of its default value of ResponseRedirect.
redirectMode is not available in all versions of .NET, so you may need to update first. More detail can be found in this article.

4. Browser-dependent code

The problem
.NET has some hooks that makes it pretty easy to write code that changes a page depending on the user agent requesting it.
Why it’s bad
  • Cloaking. Pages that change based on the user agent (i.e. Googlebot or Firefox) is dangerous for a lot of reasons, but from an SEO perspective it is dangerous because it could lead to unintentional cloaking of content, which can result in having a severe penalty put on your site.
By default, there is nothing user agent-dependent about the code that is served by IIS/.NET. But because the functionality is there, it is possible that browser-dependent code exists in your site.
The culprit
I believe this functionality dates back to the late 1990′s/early 2000′s when browsers had widely different support for web standards. If you are feeling nostalgic for those days, here is an old browser compatability chart that you can look at until the feeling goes away.
Some ideas for fixing it
Chances are there is nothing to fix, but if you want to look at your source code for potential browser-dependent logic, here is an article with sample code that should give you an idea of what to look for.

Conclusion

I hope this article helps you make your IIS installation more search engine-friendly. I have spoken with some very smart Windows developers who initially swore to me that there was no fix for some of the issues in this list, so there is a pretty good chance that your development team isn’t aware of all of these issues or even that these fixes exist.
Of course, these are only a few of the issues that I see with IIS on a regular basis. Others include cacheability of the site, character encoding issues, and URL redirects.
The easiest way to pinpoint these types of issues is by looking at your server logs.

Search Engine Land

0 comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Share

Twitter Delicious Facebook Digg Stumbleupon Favorites More