Google Blogoscoped

Forum

Yahoo on Hadoop

Search-Engines-Web.com [PersonRank 10]

Wednesday, February 20, 2008
16 years ago2,754 views

http://developer.yahoo.net/blog/archives/2008/02/hadoop_production_yahoo_search_webmap.html

Some big news in the world of Hadoop comes out of Yahoo! today. We believe we're now running the world's largest Hadoop application, a 10,000 core Linux cluster producing data used by the Yahoo! Search Webmap.

As you can see from the announcement on the Hadoop Blog:

   The Webmap build starts with every Web page crawled by Yahoo! and produces a database of all known Web pages and sites on the internet and a vast array of data about every page and site. This derived data feeds the Machine Learned Ranking algorithms at the heart of Yahoo! Search.

   Some Webmap size data:

   * Number of links between pages in the index: roughly 1 trillion links
   * Size of output: over 300 TB, compressed!
   * Number of cores used to run a single Map-Reduce job: over 10,000
   * Raw disk used in the production cluster: over 5 Petabytes

http://www.ysearchblog.com/archives/000521.html

Forum home

Advertisement

 
Blog  |  Forum     more >> Archive | Feed | Google's blogs | About
Advertisement

 

This site unofficially covers Google™ and more with some rights reserved. Join our forum!