http://developer.yahoo.net/blog/archives/2008/02/hadoop_production_yahoo_search_webmap.html
Some big news in the world of Hadoop comes out of Yahoo! today. We believe we're now running the world's largest Hadoop application, a 10,000 core Linux cluster producing data used by the Yahoo! Search Webmap.
As you can see from the announcement on the Hadoop Blog:
The Webmap build starts with every Web page crawled by Yahoo! and produces a database of all known Web pages and sites on the internet and a vast array of data about every page and site. This derived data feeds the Machine Learned Ranking algorithms at the heart of Yahoo! Search.
Some Webmap size data:
* Number of links between pages in the index: roughly 1 trillion links * Size of output: over 300 TB, compressed! * Number of cores used to run a single Map-Reduce job: over 10,000 * Raw disk used in the production cluster: over 5 Petabytes
http://www.ysearchblog.com/archives/000521.html |