February 11, 2011

What is Canonicalization?

Canonicalization for SEOs refers to normalizing (redirecting to a single dominant version) multiple URLs.

Canonical Tag is written by adding rel="canonical"

Apache web server:

  • http://www.example.com/
  • http://www.example.com/index.html
  • http:/example.com/
  • http://example.com/index.html

Microsoft Internet Information Services (IIS):

  • http://www.example.com/
  • http://www.example.com/default.asp (or .aspx depending on the version)
  • http://example.com/
  • http://example.com/default.asp (or .aspx)
  • or any combination with different capitalization.

What is Canonicalization?


Canonicalization can be a challenging concept to understand (and hard to pronounce - "ca-non-ick-cull-eye-zay-shun"), but it's essential to creating an optimized website. The fundamental problems stem from multiple uses for a single piece of writing - a paragraph or, more often, an entire page of content will appear in multiple locations on a website, or even on multiple websites. For search engines, this presents a conundrum - which version of this content should they show to searchers? SEOs, refer to this issue as duplicate content.
To provide the best searcher experience, search engines will rarely show multiple, duplicate pieces of content and thus, are forced to choose which version is most likely to be the original (or best).

SEO Best Practice


For SEOs, canonicalization refers to individual web pages that can be loaded from multiple URLs. This is a problem because when multiple pages have the same content but different URLs, links that are intended to go to the same page get split up among multiple URLs. This means that the popularity of the pages gets split up. Unfortunately for web developers, this happens far too often because the default settings for web servers create this problem. The following lists show the most common canonicalization errors that can be produced when using the default settings on the two most common web servers:

Apache web server:

  • http://www.example.com/
  • http://www.example.com/index.html
  • http:/example.com/
  • http://example.com/index.html

Microsoft Internet Information Services (IIS):

  • http://www.example.com/
  • http://www.example.com/default.asp (or .aspx depending on the version)
  • http://example.com/
  • http://example.com/default.asp (or .aspx)
  • or any combination with different capitalization.
Each of these URLs spreads out the value of inbound links to the homepage. This means that if the homepage has multiple links to these various URLs, the major search engines only give them credit separately, not in a combined manner.
Luckily for SEOs, web developers developed methods for redirection so that URLs can be changed and combined. Two primary types of server redirects exist: 301 redirects and 302 redirects:
  • A 301 indicates an HTTP status code of “Moved Permanently."
  • A 302 indicates a redirect that is temporary
Though the difference appears to be merely semantics, the actual results are dramatic. Google does not pass link juice (ranking power) equally between normal links and server redirects. The engineers and SEOs at SEOmoz have done a considerable amount of testing around this subject and concluded that 301 redirects pass between 90 percent and 99 percent of their value, whereas 302 redirects pass almost no value at all.
Canonicalization is not limited to the inclusion of alphanumeric characters. It also dictates forward slashes in URLs. If a web surfer goes to http://www.google.com they will automatically get redirected to http://www.google.com/ (notice the trailing forward slash). This is happening because technically the latter is the correct format for the URL. Although this is a problem that is largely solved by the search engines already (they know that www.google.com is intended to mean the same as www.google.com/) it is still worth noting because many servers will automatically 301 redirect from the version without the trailing slash to the correct version. By doing this, a link pointing to the wrong version of the URL loses between 1 percent and 10 percent of its worth due to the 301 redirect. The takeaway here is that whenever possible, it is better to internally link to the version with the backslash.
One common mistake when implementing canonicalization fixes to accidentally create a infinite loop between http://www.example.com and http://www.example.com/index.html. The solution to this common glitch is discussed in this post about redirecting an index file to your domain without looping.

Related Resources




Open Site Explorer
Open Site Explorer is a free tool that provides webmasters the ability to see up to 10000 links to any site or page on the web via the Linkscape web index.
Linkscape
A professional quality inlink tool that uses patent-pending SEOmoz metrics. Inlinks, anchor text distribution and more.

External Resources


HTTP Status Codes
W3's official documentation for HTTP Status codes.
SEO Advice: URL Canonicalization
Matt Cutts, head of the Webspam Team at Google, advice on canonicalization.


Read the original post here
- Canonicalization | SEOmoz

0 comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Share

Twitter Delicious Facebook Digg Stumbleupon Favorites More