I just had a bit of a conundrum with the robots.txt file on a website, and have managed to find a solution which I thought would be interesting and useful for anyone wanting to make sure that they are blocking the right things in their robots.txt file.
The problem
You have a folder on your website which contains some useful pages which you want Google to be able to see (e.g.www.yoursite.co.uk/folder1/useful-page.html), BUT the default page for this folder (e.g.www.yoursite.co.uk/folder1/) is either blank or creates a duplicate version of one of the sub pages.
In this case what you want to be able to do is to tell the search engines to NOT view the root folder but to see all of the sub pages within the folder.
However, if you add the below to the Robots.txt file for your website both the folder and the pages underneath it will be disallowed, and the search engines won’t then see your valuable content.
Disallow: /folder1/
The Solution
I wondered if it might be possible to use a $ at the end of the disallow line as this is the Regex symbol which means “if there’s anything after the last character in my rule, then ignore it” which should mean that if the below rule is created then all sub pages from my folder WILL be visible whilst the folder root isn’t.
Disallow: /folder1/$
I’ve tested my new rule in Google Webmaster Tools – for both www.mysite.co.uk/folder1/ and www.mysite.co.uk/folder1/useful-page.html and as hoped my folder root is blocked but my useful page is visible.
So, I’ve successfully blocked a page I don’t want Google to see (as it could be seen as duplicate content) whilst allowing Google to see all of the useful content contained inside my folder.
Needless to say, in a geeky way, I was quite excited about this discovery, and I shall definitely be using this solution again to ensure that the SEO of websites I work on is not impacted by Google seeing pages I really don’t want them to see.
by Emily Mace
Source: Robots.txt – blocking the right things | Vertical Leap Blog
The problem
You have a folder on your website which contains some useful pages which you want Google to be able to see (e.g.www.yoursite.co.uk/folder1/useful-page.html), BUT the default page for this folder (e.g.www.yoursite.co.uk/folder1/) is either blank or creates a duplicate version of one of the sub pages.
In this case what you want to be able to do is to tell the search engines to NOT view the root folder but to see all of the sub pages within the folder.
However, if you add the below to the Robots.txt file for your website both the folder and the pages underneath it will be disallowed, and the search engines won’t then see your valuable content.
Disallow: /folder1/
The Solution
I wondered if it might be possible to use a $ at the end of the disallow line as this is the Regex symbol which means “if there’s anything after the last character in my rule, then ignore it” which should mean that if the below rule is created then all sub pages from my folder WILL be visible whilst the folder root isn’t.
Disallow: /folder1/$
I’ve tested my new rule in Google Webmaster Tools – for both www.mysite.co.uk/folder1/ and www.mysite.co.uk/folder1/useful-page.html and as hoped my folder root is blocked but my useful page is visible.
So, I’ve successfully blocked a page I don’t want Google to see (as it could be seen as duplicate content) whilst allowing Google to see all of the useful content contained inside my folder.
Needless to say, in a geeky way, I was quite excited about this discovery, and I shall definitely be using this solution again to ensure that the SEO of websites I work on is not impacted by Google seeing pages I really don’t want them to see.
by Emily Mace
Related Posts
- SEO Speak: What is a Robots file?Google Adwords Bot and Robots.txt
- Checking your Robots.txt file
- Yes you CAN redirect a robots.txt
- Adding Pages to robots.txt Takes Time to Work
- Testing your Robots.txt file
- More on Robots.txt
- Can I redirect a robots file?
Source: Robots.txt – blocking the right things | Vertical Leap Blog
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.