Wednesday, September 28, 2005

Google Blog Search Searchable

Ouch. It looks like Google made an error by allowing search spiders to crawl the results of its blog search engine. Usually, it’s good practice to disallow searchbots to index search results themselves. (Because usually, only spammers allow their search results to get indexed, creating search result “noise” along the way). I can’t see any robots.txt* or meta-tags in this case though, and a search for returns 18,700 results at the moment. Yahoo already caught a few hundred result pages, too.

*Google does disallow spidering of in their main robots.txt at, but they forgot their sub-domain “”. Other sub-domains, such as, do have their own robots.txt.

How, you may ask, can Google crawl these search results if there’s no one linking to them? The answer is simple; everyone who has some form of PageRank-checking Google toolbar installed in their browser leaves a URL trace Google is being able to see, and later, crawl. (In other instances, people also did link to the blog search results, of course.)

Here are some of the searches people performed on Google Blog Search:

There is a chance that after some hours, when you read this, Google fixed their robots.txt. I informed them about the situation. It may also be, to avoid privacy issues and finger-pointing, Google quickly “bans” their blog search from their web search listings or finds another way to quickly clear the index.


