Every one love it when Google crawlers index our site within minutes you publish new posts. But often there are cases in which you dont want to index some part of your web site. Reasons can be like,if you happen to have sensitive data on your site that you do not want the world to see. Or may be, if you have two versions of a page (one for viewing in the browser and one for printing), you’d rather have the printing version excluded from crawling, otherwise you risk being imposed a duplicate content penalty.
Robots.txt is a text file you put on your site to tell search robots which all pages you would like them not to visit. Let me make it clear, it’s no way a firewall or a kind of password protection but rather a request from ourside. Its up to the search engine to decide whether to accept it or not. May be like a “Please, do not enter” note on an unlocked door, you cannot prevent thieves from coming in but the good guys will not open to door and enter.But if ever you plan to implement this, do make it a point to place robot.txt in the main directory because they do not search the whole site for a file named robots.txt. Instead, they look in the main directory (http://XYZ.com/robots.txt) and if they don’t find it there, they simply assume that this site does not have a robots.txt. So, if you don’t put robots.txt in the right place, do not be surprised that search engines index your whole site.
Archive for September, 2007
Robots.txt
Published September 14, 2007 General 2 CommentsTags: Google, Google search, Matt Cutts, Optimization, Robots, SEO, Webmaster Tools
I bet this happened with every one of you. You buy a brand new 100GB hard disk and happly after installation checks system manager to painfully realize that its shows 93GB in total. You get confused! Did the vendor cheated me?? or is it for some funky system usage they have reserved up my precious 7 GB?
You are not alone! The fact we finds out the hard way is that there are two ways to define a gigabyte!
When you buy a “100 Gigabyte” hard drive, the vendor defines it using the decimal powers of ten definition of the “Giga” prefix.
100 * 109 bytes = 100,000,000,000 = 100 Gigabytes
But the operating system determines the size of the drive using the computer’s binary powers of two definition of the “Giga” prefix:
93 * 230 bytes = 99,857,989,632 = 93 Gigabytes
If you’re wondering where 7 Gigabytes of your 100 Gigabyte drive just disappeared to, you have the answers. It’s an old trick by hard drive makers– they intentionally use the official SI definitions of the Giga prefix so they can inflate the sizes of their hard drives, at least on paper. This was always an annoyance, but now it’s much more difficult to ignore, as it results in large discrepancies with today’s enormous hard drives. When your Terabyte hard drive is not a Terabyte? Its 931 GB.


