Sunday, June 3, 2007
Behind the Scenes of Google Rankings
The New York Times has a mighty interesting inside look at Google’s top search algo engineers, most prominently 39-year old Amit Singhal, and their methodologies. Some highlights:
- Users expect more: Amit says search moved from “Give me what I typed” to “Give me what I want.” Users expect more and more these days and take good results for granted... if something they’re looking for is not in the top 3 results, something feels broken.
- Balancing changes: The engineers don’t make quick changes and often need to balance the pros and cons of a particular ranking algorithm tweak, because some search queries may be positively affected by the change, while others may be affected negatively. (There is no word on the Google evaluation laboratory, details of which leaked to the public in 2005; this lab is a place where human quality raters gave or still give feedback on ranking tweaks.)
- How search problems are escalated: To alert Amit’s team of ranking problems, the 10,000 Google employees have a reporting tool called Buganizer. Around 100 search issues are sent on a given day. When Amit receives query issues he “treasures” them, ranks them by importance, and tries to fix problems, or analyze if an individual issue might be part of a larger, more complex ranking problem. One of the tools Amit uses for this analysis is called Debug, showing how Google’s computers evaluate each search query and each web page. (OK, I want my copy!)
- Ensuring fresh pages don’t get lost: Google introduced a value called QDF for “query deserves freshness,” because it turned out that one recent important, and more general issue, was that some new web pages were vastly under-valued in rankings. However, it also turned out that the algo couldn’t be just tweaked to simply emphasize fresher pages, because that would’ve harmed placement of older authority pages in other cases where the user might not want the new page. So Amit and his team looked into different ways to tackle the issue, QDF being one of the measurements trying to determine just what the searcher wants, by e.g. looking at how much a specific topic is currently discussed by blogs (or by checking how often the topic is searched for in Google at the time, similar to what Google shows us as Hot Trends, I guess).
- Ranking pages through signals, classifiers & topicality: Web pages are evaluated by 200 so-called “signals” (the famous PageRank algorithm being one signal). Another signal may be e.g. in the historical data of how a web page changed over time... or personal user search history (Google is also taking into account on which results users click on in order to determine ranking, as a Google engineer recently told a group of us). Added to signals, Google uses “classifiers” to determine what category the search query belongs to; e.g. whether the searcher is looking for information on a place, wants to buy a product, or googles the name of a non-celebrity. Now, through something Google calls topicality – “a measure of how the topic of a page relates to the broad category of the user’s query,” as the NYT puts it – the overall relevancy score for a page for a given query is calculated, and another “diversity” tweak ensures the top 10 is varied enough if that’s not already the case (a discussion of the final top 10 tweak, censorship in e.g. Germany, France or China, is omitted).
- Searchers often use ambiguous queries: While it seems easy to have the word “bio” to also return pages containing the word “biography,” the word “apples" for instance ought not to result in a match for “Apple”... so it’s not as trivial as it may seem.
[Thanks Robert Birming, Anu Garg and George R.!]
>> More posts
Advertisement
This site unofficially covers Google™ and more with some rights reserved. Join our forum!