Citing the need to protect its server resources, Google has prevented competing search engines from indexing much of its own Web site, the company confirmed Wednesday.
While Google has amassed an index of over 2 billion Web pages by automatically "spidering" or "crawling" sites all over the Web, the popular search portal has effectively walled off numerous sections of its own site from other search "bots."
By placing a special "robots.txt" file on its server, Google has prohibited other crawlers from indexing 19 areas of its site, including one that offers searches of the companys archive of Usenet newsgroup discussions, as well as an area for exploring its index of graphic images on the Web. Also blocked are a special index of mail-order catalogs, and a section that allows searches of news articles at other sites.
As a result, a search of the phrase "LL Bean" performed through a link to Google set up at AltaVista.com produces no results. Doing the same search going directly though Google.com produces 20 pages that include the LL Bean name.
News source: TechNews.com
View: The Full Story
View: Meyers posting on RISKS