Google Blogoscoped


Caffeine: Google's new search index

mbegin [PersonRank 10]

Wednesday, June 9, 2010
14 years ago4,155 views

<< Today, we're announcing the completion of a new web indexing system called Caffeine. Caffeine provides 50 percent fresher results for web searches than our last index, and it's the largest collection of web content we've offered. Whether it's a news story, a blog or a forum post, you can now find links to relevant content much sooner after it is published than was possible ever before...

With Caffeine, we analyze the web in small portions and update our search index on a continuous basis, globally. As we find new pages, or new information on existing pages, we can add these straight to the index. That means you can find fresher information than ever beforeā€”no matter when or where it was published.

Caffeine lets us index web pages on an enormous scale. In fact, every second Caffeine processes hundreds of thousands of pages in parallel. If this were a pile of paper it would grow three miles taller every second. Caffeine takes up nearly 100 million gigabytes of storage in one database and adds new information at a rate of hundreds of thousands of gigabytes per day. You would need 625,000 of the largest iPods to store that much information; if these were stacked end-to-end they would go for more than 40 miles. >>

Juha-Matti Laurio [PersonRank 10]

14 years ago #


Juha-Matti Laurio [PersonRank 10]

14 years ago #


is not accessible any more.

George R [PersonRank 10]

14 years ago #

Does anyone (e.g. Matt) know if PageRank is also updated at the same time? If not, how do new pages become visible on a timely basis?

Is Google's public page cache updated at the same time?

Do Google's internal data caches (not the web page cache) invalidate every few seconds?

Forum home


Blog  |  Forum     more >> Archive | Feed | Google's blogs | About


This site unofficially covers Google™ and more with some rights reserved. Join our forum!