Pages

Sunday, October 16, 2011

On Page Optimization - Robots Exclusion Protocol

The robots.txt file is a file that gives search engine specific instructions when the crawler comes for a visit.
For an SEO campaign, you must have and use the robots.txt file because it helps the search engines to know which page he should index and which he shouldn’t.

First off, you should exclude any folder without any search value like folders with scripts, cgi-bin, pages that are accessible only to registered users, pages with duplicate content, and so on. Now let’s clear things up, by duplicate content I mean article archives and things like that. Sometimes you post articles that resemble others, search engines could see that as duplicate content.

Here is what a robots.txt file looks like (of course there are many different attributes and configurations to it)


User-agent: Googlebot
Disallow:
User-agent: *
Disallow: /tmp/
Disallow: /cgi-bin/
Once you have completed you robots.txt file, make sure to put it in the root of your website.

You may also create a robots.txt file in the Google Webmaster Tools. To do so, click the website you wish to add the file to, then go to “site configuration”, click on “crawler access”, click on “generate robots.txt” and follow the steps they give you.

Once that is complete, you can check if your robots.txt file is a success by using the “test robots.txt” tool in the Google Webmaster Tools.

0 comments:

Post a Comment

 

Sample text

Sample Text

Sample text