Robots.txt

Every one love it when Google crawlers index our site within minutes you publish new posts. But often there are cases in which you dont want to index some part of your web site. Reasons can be like,if you happen to have sensitive data on your site that you do not want the world to see. Or may be, if you have two versions of a page (one for viewing in the browser and one for printing), you’d rather have the printing version excluded from crawling, otherwise you risk being imposed a duplicate content penalty.

Robots.txt is a text file you put on your site to tell search robots which all pages you would like them not to visit. Let me make it clear, it’s no way a firewall or a kind of password protection but rather a request from ourside. Its up to the search engine to decide whether to accept it or not. May be like a “Please, do not enter” note on an unlocked door, you cannot prevent thieves from coming in but the good guys will not open to door and enter.
But if ever you plan to implement this, do make it a point to place robot.txt in the main directory because they do not search the whole site for a file named robots.txt. Instead, they look in the main directory (http://XYZ.com/robots.txt) and if they don’t find it there, they simply assume that this site does not have a robots.txt. So, if you don’t put robots.txt in the right place, do not be surprised that search engines index your whole site.

2 Responses to “Robots.txt”


  1. 1 Nathal Reve September 18, 2007 at 5:09 pm

    Need to know more.I work for an asp based sales site.We do have a print versions for reports.

  2. 2 Ram September 24, 2007 at 6:18 pm

    I dont need this…let them index everything i have ;)

Leave a Reply




 

September 2007
S M T W T F S
« Aug   Oct »
 1
2345678
9101112131415
16171819202122
23242526272829
30  

Categories

Blog Stats

  • 29,660 hits

Last 100 Visitors

Map IP Address

Map IP Address