Google Blogoscoped

Tuesday, July 8, 2008

Behind the Scenes of a Google Query
By Brian Ussery

A lot of people have used Google but few have any idea as to the complexity and computer power necessary to return billions of queries with millions of results all in under a second. This isn’t by accident, an oversight by Google or even due to security concerns, Google believes in not making search seem complicated to users. This notion is perhaps best illustrated in Google’s “Technology Overview”, which uses just 5 images and 4 sentences to explain the “Life of a Google Query”.

Despite Google’s simplistic interface, search is complicated. For example when a user in San Francisco enters a query like google.com/search?q=blogoscoped, the user’s browser first completes a DNS lookup mapping www.google.com to a specific IP address. At this stage, Google’s DNS load balancer determines which cluster of computers at which of Google’s 36+ data centers will process the query.

If the nearest data center isn’t available to process the query, it’s passed on to the nearest available data center. (For this example, the nearest known data centers might be Mountain View, CA, Pleasanton, CA, San Jose, CA, Los Angeles, CA, Palo Alto, CA, Seattle, WA, Portland, OR and/or The Dalles, OR.) Once a data center has been determined, the query is transmitted via “HTTP” to a specific data center and individual cluster of 1,800 or more servers.

Upon arrival at the data center cluster, each query is greeted by Google’s second load balancer. The Google hardware load balancer consists of 10 to 15 machines and determines which machines are available to process the query. The hardware load balancer then earmarks and hands off the query to a Google Mixer. This “Google Mixer” software, will later combine all of the elements of Universal search results with the right blend of ads. The Mixer, queries a number of Google Web Servers (GWS), selecting one available to execute the query.

The query is then executed, simultaneously hitting 300 to 400 back-end machines representing Google’s verticals, advertising and spell check among others. At this point the best results are gathered and the query data returns to the Google Mixer. The mixer takes this data, blends Universal elements with ads while pasting results in order based on relevancy. The ordered results then go back to the GWS for HTML coding. Once the HTML is completed and pages formatted, the search engine results are marked “done” by the load balancer and returned to the user as search engine results pages (SERPs). The entire process taking, about 3 centiseconds or three times as long as it takes a lightning bolt traveling at 186,000 miles per second to strike.


Google search engine results 1-10 of about 517,000 for “blogoscoped" returned in .03 seconds.

Today it’s estimated that Google queries travel across 700-1000* machines, a figure that has nearly doubled since 2006 perhaps due in part to the introduction of Google Universal. Either way, some things to think about the next time you google yourself!

*Check this MP3, video and PDF for reference.

Advertisement

 
Blog  |  Forum     more >> Archive | Feed | Google's blogs | About
Advertisement

 

This site unofficially covers Google™ and more with some rights reserved. Join our forum!