Yahoo on Hadoop - Google Blogoscoped Forum

Forum

Yahoo on Hadoop
Search-Engines-Web.com	Wednesday, February 20, 2008 16 years ago • 2,754 views
http://developer.yahoo.net/blog/archives/2008/02/hadoop_production_yahoo_search_webmap.html Some big news in the world of Hadoop comes out of Yahoo! today. We believe we're now running the world's largest Hadoop application, a 10,000 core Linux cluster producing data used by the Yahoo! Search Webmap. As you can see from the announcement on the Hadoop Blog: The Webmap build starts with every Web page crawled by Yahoo! and produces a database of all known Web pages and sites on the internet and a vast array of data about every page and site. This derived data feeds the Machine Learned Ranking algorithms at the heart of Yahoo! Search. Some Webmap size data: * Number of links between pages in the index: roughly 1 trillion links * Size of output: over 300 TB, compressed! * Number of cores used to run a single Map-Reduce job: over 10,000 * Raw disk used in the production cluster: over 5 Petabytes http://www.ysearchblog.com/archives/000521.html

Blog | Forum more >> Archive | Feed | Google's blogs | About

This site unofficially covers Google™ and more with some rights reserved. Join our forum!