Google Blogoscoped

Friday, September 5, 2003

Humble Google With 5 Billion Pages

Microdoc does the “+the” search in Google to find out there’s 5,240,000,000 pages. That’s revealing quite a bigger index than what Google’s advertising on their front-page, and that’s mostly just English pages*.

*Not exclusively English, though – Google would also assign the “the” keyword if pages are linking to the page using that keyword. So if a German page not containing “the” gets a link reading “read the German version”, it would also be assigned the keyword.

Does size matter when it comes to the search database? Yes and no. If the search engine has ways of filtering out definite spam, a small index wouldn’t indicate a lack of relevancy. But what is important is a large database for all the pages that matter. Here especially very recent content found in the world of news and blogging. What’s important is to have all the best pages, and to have the means to show the best first.

So why does Google hide its Database size?

“Everything Google does is understated. The front-page is a simple, minimalist design hiding a highly complex array of algorithms, data-organization, robot devices and more. Dig under the surface of Google and you are digging for days and months visiting the complexity that is the Google set of search tools. In keeping with that theme of understating what it is that is Google, it is little wonder that Google Inc understate the size of their database. While AllTheWeb (Overture) are running hard to increase the size of their database, Google simply increases the number on the front page to be more than the competitors. Why state you have 5 billion webpages databased when all you need is 3.3 billion to beat your nearest rival? Smart strategy.”
– Microdoc, Google Understates the Size of Its Database, 09/02/2003

Room for growth?

OK, so still nobody needs more spammy pages. Maybe Google has a lot of pages up its sleeve that it would rather not show too proudly. As you may know you can’t get past the 1000th result anyway (not that there’d be much interesting to see after 999 results). And here’s my utopian wishlisht for what Google should index:

In fact, in some decades, I want to dump my whole brain structure on the web. You’d be able to search for any thought that ever popped up in a head on this planet. A human brain has about 50,000 thoughts per day. That’s 18,250,000 per year. We multiply this with an approximation of 6 billion people in the world and get 109,500,000,000,000,000 little pages. And that’s still much less then a googol (that’s a 1 followed by 100 zeros* – just consult the Google Calculator).

Google Calculator understands "Googol"

*Like this: 10 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000

googol and googolplex

A googol is 10 to the 100th power (which is 1 followed by 100 zeros). The term was invented by Milton Sirotta, the 9-year nephew of mathematician Edward Kasner, who had asked his nephew what he thought such a large number should be called. Such a number, Milton apparently replied after a short thought, could only be called something as silly as...a googol! A googol is larger than the number of elementary particles in the universe, which amount to only 10 to the 80th power.

Later, another mathematician devised the term googolplex for 10 to the power of googol – that is, 1 followed by 10 to the power of 100 zeros. Frank Pilhofer has determined that, given Moore’s Law (which is that computer processor power doubles about every 1 to 2 years), it would make no sense to try to print out a googolplex for another 524 years – since all earlier attempts to print a googolplex out would be overtaken by the faster processor.”
Googol (A WhatIs definition), Apr 22, 2003

 

Oh yeah: if Googolplex is still not big enough for you, try Graham’s number... it’s “far larger than most people’s conception of infinity”.

Advertisement

 
Blog  |  Forum     more >> Archive | Feed | Google's blogs | About
Advertisement

 

This site unofficially covers Google™ and more with some rights reserved. Join our forum!