Google Blogoscoped

Forum

How Many Google Books Pages?  (View post)

jon [PersonRank 0]

Tuesday, December 6, 2005
14 years ago

well actually the idea was that you could see all the pages (as opposed to just two or three) in a book by searching that qury. kind of like a hack!

Philipp Lenssen [PersonRank 10]

14 years ago #

But that part didn't work for me... I could only see the first three pages in a particular book, at least when I tried scrolling through it with the Next button Google offers... then it would give me a little note on "Why this is copyrighted..."

jon [PersonRank 0]

14 years ago #

well i just searched for a book
and then clicked on a book
in that page i wouldtype that query and hit search this book. so it's not searching for books with that query but searching inside books with that query which will give you all the pages. hope they (google people) don't find out about this :)

viggen [PersonRank 1]

14 years ago #

adding the italian si gives me 140 million
a | the | and | but | of | from | der | die | das | le | si | la |

cheers
viggen

jake [PersonRank 0]

14 years ago #

(0 OR 1 OR 3)
156 mil

viggen [PersonRank 1]

14 years ago #

1|0| a | the | and | but | of | from | der | die | das | le | la
191000000 pages

hehe that is fun... :)

cheers
viggen

jake [PersonRank 0]

14 years ago #

1|0|3| a | the | and | but | of | from | der | die | das | le | si | la

226 mil

none [PersonRank 1]

14 years ago #

in google.com you can put something like [-amkmwienawwwawewwnjwi] but this don't work with print.google.com

lramirez [PersonRank 1]

14 years ago #

They want my google account. They'll probably charge me the book if I exceed the 3 pages lol

mrnibz [PersonRank 1]

14 years ago #

a | the | and | but | of | from | der | die | das | le | la | an | do | sa | de | si | 0 | 1 | 2 | go

a | the | an | but | of | from | der | die | das | le | la | . | , | sa | de | si | 0 | 1 | 2 | go |!

produces anywhere from 219 million to 235 million... it isn't consistent though.

jake [PersonRank 0]

14 years ago #

by|provided|http|1|0|3| a | the | and | but | of | from | der | die | das | le | si | la

236 mil

James Bradbury [PersonRank 5]

14 years ago #

Huh. Adding the Spanish word el actually makes the total decrease from 236 million to 230 million. This shows an error in the OR algorithm.

Brian Mingus [PersonRank 10]

14 years ago #

247,000,000 with date:0000-9999|a|http
see: flickr.com/photos/breflection/ ...
I noticed GBS (can we call it that from now on? i hate typing it out) returns pretty different results almost every time, and also varies the number of results it reports by quite a lot. Also, the `oldest' book in GBS: date:0000-1012

James Bradbury [PersonRank 5]

14 years ago #

Huh. Adding the Spanish word el actually makes the total decrease from 236 million to 230 million. This shows an error in the OR algorithm.

Brian Mingus [PersonRank 10]

14 years ago #

I'm pretty sure that date:1012-2020 will capture all the pages in GBS (from the `oldest' to `newest'). I've gotten ~237 million with that query. Note that at this point all we are doing is fiddling with the parameters of Google's result estimation algorithms.

Another neat one is: -date:1012-2020 a
Google won't let you search for -date... because it is too general, and it won't include a in your search because it is a stop word. But put them together and no doubt you reveal all the results (i've gotten ~242 million with that query).

jake [PersonRank 0]

14 years ago #

date:0000-9999|by|provided|http|1|0|3| a | the | and | but | of | from | der | die | das | le | si | la

~ 262-272 mil

Brian Mingus [PersonRank 10]

14 years ago #

I haven't gotten over 256 mil with that query (just different data centers likely). I got 257 mil by adding a french and chinese stop word to the end. Could have been random, didn't hurt:

date:0000-9999|by| provided| http| 1| 0| 3| a| the |and |bu t| of| from| der| die| das| le| si| la| de|了

Adding stop words from the OCLS's distribution of languages in GBS might squeeze a lil' more out: dlib.org/dlib/september05/lavo ...

jake [PersonRank 0]

14 years ago #

I get 265 mil with your date:0000-9999|a|http , so that is most elegant at this point

Brian Mingus [PersonRank 10]

14 years ago #

I get the most with date:1012-2020|date:0000-9999

olivier ertzscheid [PersonRank 1]

14 years ago #

Have a look here :
affordance.typepad.com/mon_web ...
A french post demonstrating that Google's counts are totally faked.

Brian Mingus [PersonRank 10]

14 years ago #

Looks like they fixed it so you can't search for two date ranges :)

This thread is locked as it's old... but you can create a new thread in the forum. 

Forum home

Advertisement

 
Blog  |  Forum     more >> Archive | Feed | Google's blogs | About
Advertisement

 

This site unofficially covers Google™ and more with some rights reserved. Join our forum!