Google Blogoscoped

Forum

strange google search results

DPic [PersonRank 10]

Tuesday, June 12, 2007
17 years ago2,304 views

http://www.google.com/search?q=site:www.google.com%20-notebook%20-intl%20-top%20-calendar&hl=en&safe=off&rlz=1B3GGGL_enUS220US220&pwst=1&start=990&sa=N

this is similar to what was posted at http://blogoscoped.com/forum/98546.html except that here, each time you hit the previous button, it says it's on the last page

James Xuan [PersonRank 10]

17 years ago #

Works for me

David Hetfield [PersonRank 10]

17 years ago #

Strange thing Danny.. :/
You should talk to your friend from Google. Let him be aware of that.

:)

Zim [PersonRank 10]

17 years ago #

interesting bug...

Rohit Srivastwa [PersonRank 10]

17 years ago #

Working for me, no issue at all
on clicking previous button it shows up the next button also

Roger Browne [PersonRank 10]

17 years ago #

It's not so much a bug, as a technical limitation.

When you search for a single keyword, Google can consult the index for that keyword and get a count of how many entries exist for that keyword.

When you search for multiple keywords, Google can get the index count for each individual keyword, but it can't easily know whether the multiple keywords tend to occur together on the same documents, or whether they tend to occur separately on distinct documents. The indexes required to store that information would be incredibly big. It's just not practical.

So, Google must look at the individual index counts, and calculate an estimate for the number of documents that contain all of the keywords. For example, if "foo", "bar" and "baz" each occur on one-thousandth of the pages on the web, Google might estimate that "foo bar baz" occurs on one-thousand-millionth of the pages on the web.

When Google actually displays your search results, it must scan the contents of its cache to determine which documents contain all three keywords. (Presumably Google caches every document it crawls; we assume that the NoCache directive just means that Google won't expose the cache publicly.)

This is computationally feasible, because most search results being displayed are for one of the first few pages, so Google only needs to check enough documents to fill those pages. But when you search for like the 80th page of results, the content jumps around a bit. Perhaps it depends on which of the networked computers returns the document list for its keyword first (because presumably the checking of content starts with the first available documents).

In the case where the keywords are expressed in the negative (as in this example), it must be even harder to calculate accurate estimates. But, I can still imagine ways to get this behaviour. The cache for all of the "site:www.google.com" pages must be split over many networked machines, and when you get this far down the results it must be affected by which of those machines return results in which order, and probably also by how many of those machines can return results within a certain time.

All of this is just speculation based on my own programming experience retrieving database entries with multiple indexes. But however Google does it, I'm sure this is not a bug, but simply a rare case where to get the "perfect" behaviour would be computationally impractical.

DPic [PersonRank 10]

17 years ago #

I can agree that it probably isn't a bug but i'm not sure that this happens for the reasons you speculated.

Forum home

Advertisement

 
Blog  |  Forum     more >> Archive | Feed | Google's blogs | About
Advertisement

 

This site unofficially covers Google™ and more with some rights reserved. Join our forum!