Google Blogoscoped

Thursday, June 22, 2006

Google’s Site Operator Was Broken

Googler Adam Lasnik at Digg writes about an error with Google’s site operator. The site operator is supposed to show the approximate number of pages indexed for any given domain (e.g. when you search for site:yahoo.com, Google says there are “about 317,000,000 from yahoo.com”). Adam – well, or someone posing as him – says:

Our engineers recently noticed that our site: queries (number of results listed for a search) were showing bizarre results. This has turned out to be tied to a bad data push, and we’re fixing this right now.

And:

[T]he number in “about [x billion]” is currently incorrect. We haven’t indexed anywhere close to as many pages of these sites as is currently suggested. It’s a significant results estimation error, thankfully limited in scope but clearly pretty stark when it appears.

Meanwhile at John Battelle’s blog, Adam comments:

[W]e noticed that lots of subdomains got indexed last week – and sometimes listed in search results – that shouldn’t have been*. Compounding the issue, our result count estimates in these contexts was MANY orders of magnitude off. For example, the one site that supposedly had 5.5 billion pages in the index actually had under 1/100,000th of that.

So how did this happen? We pushed some corrupted data with our index. Once we diagnosed the problem, we started rolling the data back and pushed something better... and we’ve been putting in place checks so that this kind of thing doesn’t happen again.

*See the previous post on how a spammer got his many subdomains indexed; this issue has been fixed by now.

[Thanks Google Operating System blog and Search-Engines-Web.]

Advertisement

 
Blog  |  Forum     more >> Archive | Feed | Google's blogs | About
Advertisement

 

This site unofficially covers Google™ and more with some rights reserved. Join our forum!