Every one love it when Google crawlers index our site within minutes you publish new posts. But often there are cases in which you dont want to index some part of your web site. Reasons can be like,if you happen to have sensitive data on your site that you do not want the world to see. Or may be, if you have two versions of a page (one for viewing in the browser and one for printing), you’d rather have the printing version excluded from crawling, otherwise you risk being imposed a duplicate content penalty.
Robots.txt is a text file you put on your site to tell search robots which all pages you would like them not to visit. Let me make it clear, it’s no way a firewall or a kind of password protection but rather a request from ourside. Its up to the search engine to decide whether to accept it or not. May be like a “Please, do not enter” note on an unlocked door, you cannot prevent thieves from coming in but the good guys will not open to door and enter.But if ever you plan to implement this, do make it a point to place robot.txt in the main directory because they do not search the whole site for a file named robots.txt. Instead, they look in the main directory (http://XYZ.com/robots.txt) and if they don’t find it there, they simply assume that this site does not have a robots.txt. So, if you don’t put robots.txt in the right place, do not be surprised that search engines index your whole site.
Robots.txt
Published September 14, 2007 GeneralTags: Google, Google search, Matt Cutts, Optimization, Robots, SEO, Webmaster Tools


Need to know more.I work for an asp based sales site.We do have a print versions for reports.
I dont need this…let them index everything i have