In June 2011, I spoke at SMX Advanced about SEO issues that I commonly run in to during technical SEO site evaluations. The part of my presentation that dealt with Microsoft’s Internet Information Server (IIS) generated a lot of comments and questions afterward, so this column addresses some of those questions about how to improve techncial SEO on the Microsoft stack.
First, a caveat: The majority of my experience has been with Linux- and BSD-based operating systems, starting with SunOS way back at Berkeley, so I’m definitely not an expert on deploying servers on Windows and/or .NET.
I’ve asked Microsoft-stack expert Colin Cochrane to correct anything Windows-related that I have stated incorrectly. (Thank you, Colin. Your link is in the mail.) Any remaining errors in this article are definitely mine, and not his.
After completing technical SEO assessments on numerous sites running on IIS and .NET, I believe that it is a very scalable and production-worthy platform, but I have found that its default settings are far from optimal from a technical SEO point of view.
This article describes the most common issues I’ve seen. Several of these issues cause canonicalization problems, as described in more detail in this article about Google’s parameter handling feature.
Oh, and here is a second caveat: Please be sure to test any changes on a staging server before rolling them out to production. I would hate for something to happen to your website because I made a typo or worded something unclearly.
1. Default Pages (Default.aspx)
The problem- http://www.site.com/directory/
- http://www.site.com/directory/Default.aspx
- Link diffusion. Inbound links to the page could point at either of these two URLs. It would be much better to focus the inbound links on only one URL.
- Crawl inefficiency. Crawlers have to crawl two URLs to get one page for each directory on the site.
The culprit
The easiest way to fix this is to put a link rel=canonical tag on these pages and point to whichever URL you want to be the canonical. It’s not as good as a permanent redirect, but it will work in a pinch if you don’t want to mess around with your server configuration.
A more permanent fix is to use a 3rd party URL rewriter, which will redirect the URL before it gets to the .NET application. Some URL rewriters I have seen used successfully on sites are URLRewrite (for IIS7 only), URLRewriter, and ISAPI Rewrite 2.
2. Case Insensitive URLs
The problemThe path part of the URLs served by IIS is case-insensitive. So any of these URLs will usually lead to the same page:
- http://www.site.com/directory/default.aspx
- http://www.site.com/Directory/Default.ASPX
- http://www.site.com/DIRECTORY/DeFaUlT.aSpX
- Crawl inefficiency. Google and Bing will crawl all of the different case variations that it sees in links, even though they all lead to the same page.
- Link diffusion. Inbound links could go to any of the variations of the same URL. I’ve even seen different capitalizations of URLs used in internal links within a website.
- Robots.txt problems. Because the robots.txt file is case-sensitive, if your URLs aren’t crawlers may be accessing URLs that you thought were blocked.
Similar to the first issue, the easiest way to resolve this is to use a link rel=canonical tag that points to the URL with the correct capitalization.
The URL rewriters listed above are the best option for normalizing the case. They can be configured to permanently redirect a URL to the right capitaliziation. If you pick an easy method for canonicalizing URLs, like converting everything to lower case, it can be implemented with one general rule.
Here is an example rule that rewrites a URL to all lower case that will work with URLRewrite:
If you implement something like this keep in mind that some URLs may require upper case, such as the Bing authorization file BingSiteAuth.xml. URLs like these need to be added to the rule as exceptions.
3. Handling Page Not Found Errors & Internal Server Errors
The problemIn its default configuration, ASP.NET handles errors (like page not found or internal server problems) by redirecting with a 302 temporary redirect to an error page, which usually returns a 200 response.
Why it’s bad
- Crawl inefficiency. Because a 302 redirect is a temporary redirect, search engines will continue to check that URL often in hopes of one day getting a page at that URL instead of a redirect. And if the target page returns a 200 response, then the search engines will index the initial URL, which means your site might start ranking with URLs that lead searchers to error pages.
And because the page not found page gets so much traffic and has so many URLs pointing to it, they tend to get crawled pretty frequently, which further reduces crawl efficiency.
- “Non-graceful” site failure. If your site starts returning an error — due to a temporary database problem, for example — large portions of your site could get de-duplicated out of the index because they are suddenly redirecting to the same URL.
This is the default behavior in ASP.NET.
Some ideas for fixing it
Fortunately, this issue has a fix that is pretty straight forward and requires a minor change to the web.config file.
Here is part of an example web.config file that prevents these redirects:
4. Browser-dependent code
The problem.NET has some hooks that makes it pretty easy to write code that changes a page depending on the user agent requesting it.
Why it’s bad
- Cloaking. Pages that change based on the user agent (i.e. Googlebot or Firefox) is dangerous for a lot of reasons, but from an SEO perspective it is dangerous because it could lead to unintentional cloaking of content, which can result in having a severe penalty put on your site.
I believe this functionality dates back to the late 1990′s/early 2000′s when browsers had widely different support for web standards. If you are feeling nostalgic for those days, here is an old browser compatability chart that you can look at until the feeling goes away.
Some ideas for fixing it
Chances are there is nothing to fix, but if you want to look at your source code for potential browser-dependent logic, here is an article with sample code that should give you an idea of what to look for.
Conclusion
I hope this article helps you make your IIS installation more search engine-friendly. I have spoken with some very smart Windows developers who initially swore to me that there was no fix for some of the issues in this list, so there is a pretty good chance that your development team isn’t aware of all of these issues or even that these fixes exist.Of course, these are only a few of the issues that I see with IIS on a regular basis. Others include cacheability of the site, character encoding issues, and URL redirects.
The easiest way to pinpoint these types of issues is by looking at your server logs.
by Todd Nemet
Source : Search Engine Land
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.