Google Blogoscoped

Forum

Checking Google's Index Size  (View post)

Utills [PersonRank 10]

Wednesday, December 27, 2006
17 years ago17,316 views

The query [a *] returns 19,400,000,000.

http://www.google.com/search?hl=en&lr=&q=a+*+&btnG=Search

I'm in the UK.

Patrick Kempf [PersonRank 1]

17 years ago #

http://www.google.com/search?q=*a*

*a*

Returns 25.270.000.000

Utills [PersonRank 10]

17 years ago #

Use this URL to get different results from different data centres

http://www.seo-contests.com/cgi-local/google.cgi?search=*+a+*

zmarties [PersonRank 10]

17 years ago #

I was amused to find that

site:*

returns just one result!

Philipp Lenssen [PersonRank 10]

17 years ago #

Weird – [site:"] returns only IPs.
http://www.google.com/search?hl=en&lr=&q=site%3A%22&btnG=Search

Philipp Lenssen [PersonRank 10]

17 years ago #

16,490,000,000 results here for Patrick's query [*a*]...

20,030,000,000 for ["_" | -_- * * *].

Tim [PersonRank 0]

17 years ago #

Well, to begin with, those numbers are not accurate. Check out this post by Danny Sullivan. See item #1:

http://blog.searchenginewatch.com/blog/060313-161500

JohnMu [PersonRank 10]

17 years ago #

You can't trust the "about" count to be anywhere close to realistic. It is generated separately from the actual index.

Jack [PersonRank 0]

17 years ago #

http://www.google.com/search?hl=en&lr=&q=%2B*a*

Results 1 – 10 of about 22,010,000,000 for +*a*. (0.10 seconds)

TimW [PersonRank 1]

17 years ago #

The search actually maxes out at 100 pages (1000 results) though, on the 100th page I get:

Results 991 – 1000 of about 18,740,000,000 for *-"a. (0.27 seconds)

My site is probably somewhere around 17,596,000,001 :(

Elias Kai [PersonRank 10]

17 years ago #

Another weired thing Philipp.
This doesn't return any value : @ "+" , "*" , /
I have tested this on msn where they showed just 10 results pages for some queries.

On Google, try – site:" – or any other query and the deeper you go through the results pages ex: page number 20 results in 0.25 seconds, page 45 results in 0.48 seconds, the longer it takes.

Page 82 is where the results ends:
Results 811 – 814 of about 558,000 for site:". (0.55 seconds)

820 pages where it end of about 558 000, how did they calculate this ?

Chrjstoph [PersonRank 0]

17 years ago #

http://www.google.com/search?q=*+AND+*+OR+*

gives 16'600'000'000 here in Switzerland, more than the others mentioned here.

Elias Kai [PersonRank 10]

17 years ago #

Results 1 – 10 of about 21,990,000,000 for "after" | -"after". (0.28 seconds)

Results 1 – 10 of about 20,260,000,000 for "again" | -"again". (0.16 seconds)

Results 1 – 10 of about 20,960,000,000 for "air" | -"air". (0.20 seconds)

query: "yes" | -"no"
Results 1 – 10 of about 17,460,000,000 for "yes" | -"no". (0.04 seconds)

"11" | -"11"
Results 1 – 10 of about 16,640,000,000 for "11" | -"11". (0.34 seconds)

Results 1 – 10 of about 15,190,000,000 for "about" | -"about". (0.27 seconds)

Scott [PersonRank 0]

17 years ago #

I've gotten as high as 25,270,000,000 with:

"a" | -a- * * * | "b" | -b-* * * | "0" | -0- * * * | "1" | -1-* * *

No matter how I add to or subtract from this search, it seems to hit a limit of 25,270,000,000. I find it intriguing that Patrick came up with the same number.

Elias Kai [PersonRank 10]

17 years ago #

Results 1 – 10 of about 22,310,000,000 for "along" | -"along". (0.18 seconds)

Results 1 – 10 of about 25,510,000,000 for "why" | -"why". (0.24 seconds)

Ionut Alex. Chitu [PersonRank 10]

17 years ago #

If you refresh or try another data center, you might get different values. These are just some estimates anyway.

Elias Kai [PersonRank 10]

17 years ago #

You did not specify which language, I tried some english and here is one chinese but I bet another chinese person can beat me with that query:'

Results 1 – 10 of about 25,270,000,000 for "天" | -"天". (0.19 seconds)

JohnMu [PersonRank 10]

17 years ago #

Seriously guys, those numbers say nothing about the index size :-).

I've seen sites with a real indexed (and "physically" available) page count of <500 with over 500 million results in the "about" count.

42

dnl2ba [PersonRank 1]

17 years ago #

Huh, "inurl:http" only gives 16.5 billion.

Stephen Tordoff [PersonRank 10]

17 years ago #

inurl:w return less results than inurl:www

Juha-Matti Laurio [PersonRank 10]

17 years ago #

Using inurl:h
gives "only" 33,500,000 results.

Ionut Alex. Chitu [PersonRank 10]

17 years ago #

Google tries to separate domain names and URLs in words (or things that look like words).

inurl:presid
inurl:preside
inurl:presiden
inurl:president

Stephen Tordoff [PersonRank 10]

17 years ago #

Didn't know that, thanks

Mitul Bid [PersonRank 1]

17 years ago #

html returns 3,540,000,000 (queried from india)

Mitul Bid [PersonRank 1]

17 years ago #

Oh.. that is much.. less :-) I thougth it was 35 billion and I had hit the highest..

TheRaveN 2.0 [PersonRank 1]

17 years ago #

Try "-"

I get nothing from Google for this one, nothing from Live.com, and an error from Yahoo.

TheRaveN 2.0

C Ramesh [PersonRank 1]

17 years ago #

a * fetches 19,690,000,000

an * fetches 11,110,000,000

the * fetches 16,090,000,000

(all from Chennai, India).

C Ramesh [PersonRank 1]

17 years ago #

5,660,000,000 for .com

Philipp Lenssen [PersonRank 10]

17 years ago #

I agree that these numbers may not be reflecting all of reality... still this may be the only way of probing Google's index size that we have. It's interesting to see that Patrick's query [*a*] returns 25,270,000,000 pages, the exact same number a search for [* *] returned at the beginning of this year, and the highest number mentioned in this thread. Is this an artificial upper limit beyond which Google doesn't share page counts?

Drew [PersonRank 0]

17 years ago #

This query:

+html | +php | +asp | +cgi | +cfm | *-"a

... sometimes returns 35 billion or so, sometimes "only" 17 billion or so. It can change literally from search to search. The highest I got was 35,710,000,000. This is from USA.

Elias Kai [PersonRank 10]

17 years ago #

Results 1 – 10 of about 25,270,000,000 for "天" | -"天". (0.32 seconds)

Weired I think that's the limit

Danny Sullivan [PersonRank 2]

17 years ago #

I don't know, Philipp. I begged Google to pull down those figures so we wouldn't have all the silliness of figuring out what they mean, who is biggest, how do you count pages and all that. I'd prefer not to poke at it.

zupolgec [PersonRank 0]

17 years ago #

Results 1 – 100 of about 25,270,000,000 for "to google" | -"to google". (0.12 seconds)

John K [PersonRank 2]

17 years ago #

This is not the only way we have of estimating Google's index size.

Google had an interesting tech talk which shows how outsiders could estimate index size accurately using random sampling.

http://video.google.com/videoplay?docid=5482133123165021987&q=google+tech+talks+august+2006

The paper (PDF) is here:

http://www.ee.technion.ac.il/people/zivby/papers/se/se.techreport.pdf

bta [PersonRank 0]

17 years ago #

I have to agree with number 25 270 000 000 as a maximum (probably) value (I tried a * *). I has been trying many of other combinations, but never get over this number.

Johan Terpstra [PersonRank 1]

17 years ago #

I get just over 24 Billion on .co.uk for [the | -the | +on | -on | a | -a] – note the +on instead of without the +, it returns more for some reason.

Interestingly, when you try this:

http://www.google.co.uk/search?hl=en&q=*******************************************a&btnG=Search&meta=

And keep refreshing the search, you get a new result set evey time, with different snippets bolded and different (high, many billions) of total counts.

Hong Xiaowan [PersonRank 10]

17 years ago #

To give a exactly number for a machine with so huge database like Google is impossible. That will make the machine down. No one can do this.
The best way to guess the number based on the real time on different server. This way is the same, but the database always changing and server changing can not at same time. So the number always changing.

Also, I think too many pages not useful. We can not search a keyword and read all results. It is crazy. I only check the first 100 results and select 3 to read.

Juha-Matti Laurio [PersonRank 10]

17 years ago #

BTW: Are Yahoo!, MSN etc. publishing their numbers somewhere?

Juha-Matti Laurio [PersonRank 10]

17 years ago #

English language Wikipedia entry still has this non-working '* *' method listed...

"In 2006, Google has indexed over 25 billion web pages..."
http://en.wikipedia.org/wiki/Google_search
-> see Search products

Chris Borrowdale [PersonRank 1]

17 years ago #

not really interested in size but just out of interest tried [.]

interesting results page ... ?!?

No message at all just the search box and the footer?

Anyone get anything at all or understand why this is?

(Search performed in the UK)

Philipp Lenssen [PersonRank 10]

17 years ago #

> BTW: Are Yahoo!, MSN etc. publishing their numbers somewhere?

I think Google stopped publishing theirs when Yahoo overtook them :)
Or maybe they just realized most users don't care, so it would only help the competition...

Ionut Alex. Chitu [PersonRank 10]

17 years ago #

Reality check:

"Otherworldly escapism"

G: 255
Y: 168
Ask: 54
Live: 19

Ionut Alex. Chitu [PersonRank 10]

17 years ago #

"saddam's death is"

G: 17,600
Y: 6,070
Ask: 83
Live: 1,641

Tony Ruscoe [PersonRank 10]

17 years ago #

<< I think Google stopped publishing theirs when Yahoo overtook them :) >>

Didn't Google release some kind of "size doesn't really matter; it's all about relevancy" statement when Yahoo did that? I seem to remember something that...

Tony Ruscoe [PersonRank 10]

17 years ago #

Also see:

Google's index nearly doubles:
<< So 8 billion pages is a milestone worth noting, but it's not the end of the road. The real test is how well we do in finding what you want from within those pages. We'll keep improving that too. >>
From: http://googleblog.blogspot.com/2004/11/googles-index-nearly-doubles.html

We wanted something special for our birthday…
<< ... this latest expansion of our index, which makes Google more than 3 times larger than any other search engine ... >>
From: http://googleblog.blogspot.com/2005/09/we-wanted-something-special-for-our.html

And:

<< So how big is Google's index?
Search engines' published metrics for index size measurement vary greatly and are no longer easily comparable. Often, for instance, web crawlers retrieve duplicate entries for one page or links to documents that they haven't crawled, and whose content thus isn't in the index. At Google we believe the essential quality of an index isn't the total number of documents, but its comprehensiveness – which unique documents are in the index. So we don't count duplicate or uncrawled pages. According to our internal testing, our newly expanded search index is more than three times larger than that of any other search engine. >>

From: http://www.google.com/help/indexsize.html

Forum home

Advertisement

 
Blog  |  Forum     more >> Archive | Feed | Google's blogs | About
Advertisement

 

This site unofficially covers Google™ and more with some rights reserved. Join our forum!