Google Blog Search Searchable

Wednesday, September 28, 2005

Google Blog Search Searchable

Ouch. It looks like Google made an error by allowing search spiders to crawl the results of its blog search engine. Usually, it’s good practice to disallow searchbots to index search results themselves. (Because usually, only spammers allow their search results to get indexed, creating search result “noise” along the way). I can’t see any robots.txt* or meta-tags in this case though, and a search for site:blogsearch.google.com returns 18,700 results at the moment. Yahoo already caught a few hundred result pages, too.

*Google does disallow spidering of Google.com/blogsearch? in their main robots.txt at Google.com, but they forgot their sub-domain “blogsearch.google.com”. Other sub-domains, such as groups.google.com, do have their own robots.txt.

How, you may ask, can Google crawl these search results if there’s no one linking to them? The answer is simple; everyone who has some form of PageRank-checking Google toolbar installed in their browser leaves a URL trace Google is being able to see, and later, crawl. (In other instances, people also did link to the blog search results, of course.)

Here are some of the searches people performed on Google Blog Search:

“Edward Tufte” OR “Edward R Tufte” “Tom Smith”
internet
iTunes
“Jonathan Grudin” “Beth Mazur”
“Danah Boyd” “Tom Erickson”
Montreal
Nafcom
Speed of thought
Integral sucks
“Christina Wodtke” “Brenda Laurel”
Redesign
France
Teapots
Doom
Ajax
inposttitle:slashdot inblogtitle:boakes
link:kbcafe.com
China
Rodent
“Nathan Shedroff” Morville
link:www.rocketboom.com
catholic
“genetic algorithms”

There is a chance that after some hours, when you read this, Google fixed their robots.txt. I informed them about the situation. It may also be, to avoid privacy issues and finger-pointing, Google quickly “bans” their blog search from their web search listings or finds another way to quickly clear the index.

Google Blog Search Search ... by Philipp Lenssen | Comments (2)

>> More posts

Blog | Forum more >> Archive | Feed | Google's blogs | About

This site unofficially covers Google™ and more with some rights reserved. Join our forum!