Google Blogoscoped


Yahoo on Hadoop [PersonRank 10]

Wednesday, February 20, 2008
12 years ago2,179 views ...

Some big news in the world of Hadoop comes out of Yahoo! today. We believe we're now running the world's largest Hadoop application, a 10,000 core Linux cluster producing data used by the Yahoo! Search Webmap.

As you can see from the announcement on the Hadoop Blog:

   The Webmap build starts with every Web page crawled by Yahoo! and produces a database of all known Web pages and sites on the internet and a vast array of data about every page and site. This derived data feeds the Machine Learned Ranking algorithms at the heart of Yahoo! Search.

   Some Webmap size data:

   * Number of links between pages in the index: roughly 1 trillion links
   * Size of output: over 300 TB, compressed!
   * Number of cores used to run a single Map-Reduce job: over 10,000
   * Raw disk used in the production cluster: over 5 Petabytes ...

This thread is locked as it's old... but you can create a new thread in the forum. 

Forum home


Blog  |  Forum     more >> Archive | Feed | Google's blogs | About


This site unofficially covers Google™ and more with some rights reserved. Join our forum!