Google Blogoscoped

Friday, April 25, 2008

Diverse Google Results Are Good

Google results which are diverse – approaching your query from different angles – are good, and those which aren’t diverse can be a barrier to find what you’re after.

Take for instance the query [google blog], which I’ve been monitoring every now and then for some years now. It’s ambiguous input which can mean mainly any of the three things: 1) the user wants to see the official blog by Google Inc., 2) the user heard about Google’s blogging platform and wants to find, or 3) the user is looking for an independent blog covering Google. There may be even more cases than this.

Some search engines, upon entering an ambiguous query, return related search suggestions. Google does so too, at the end of the results; they link to searches for [unofficial google blog], [google employee blog], [create blog] and more. But the best thing for the search engine is to cover all possible answers among the top 10 results, so that the searcher has the chance to lazily jump to any of the results without further query refinement.

In the search for [google blog] (see screenshot), what we’re seeing at the time is roughly the following result – I’m coloring the different types:

1. Official Google Blog
2. ... same blog as above ...
3. Google’s blog Search
4. Google’s blogging platform
5. Independent blog about Google
6. ... another independent blog about Google ...
7. Another official Google blog
8. ... another official Google blog ...
9. ... another official Google blog ...
10. ... another official Google blog ...

This is an excellent example of satisfying the different search cases, as there’s a lot of variety in result types. However, over the past there were times when all the official Google blogs would start to push independent Google blogs (like this one) further down the page. As Google has so many blogs by now, it can be thought of as a powerful link network helping its member blogs (not consciously, perhaps; they also generously link out to independent blogs). This network effect seems to have been mostly corrected in the latest results for this search, though. (There are still problems with this result – one independent blog shown in the result is not updated anymore, yet there are more interesting new independent blogs about Google which the result does not feature... perhaps we can think of this as a link heritage problem, which may favor old stuff over new stuff.)

In other searches, the network effect is still strong, and sometimes arguably too strong. Take a search for [amsterdam hotel], for instance. I would think in this case, searchers are mainly looking to rent a room in Amsterdam for an upcoming trip. What the Google top 10 now shows are 9 hotel booking sites with offers for Amsterdam, as well as one hotel which is called “Hotel Amsterdam” in position 7. This is another good result, because while not very diverse, it’s likely diverse enough. Here’s a color chart for the search – we can assume that all but one player in this result set are in it for the commission money of referring searchers to a specific hotel:

1. General hotel booking site
2. General hotel booking site
3. General hotel booking site
4. General hotel booking site
5. General hotel booking site
6. General hotel booking site
7. Specific hotel by that name
8. General hotel booking site
9. General hotel booking site
10. General hotel booking site

However, once you went to a specific booking site and you find a hotel which you might want to book, you also might want to find out more about the particular hotel. Often, the hotel’s homepage will show galleries and detailed contact info and more, data which the booking site may not have. But these single homepages will frequently not be as heavily search engine optimized and networked as the larger, more general hotel booking sites, which also have gotten their entries for individual hotels indexed in Google!

Consequently, the search for a particular hotel – [marnix hotel amsterdam] – will show this homogeneous result set (sometimes with photos and good reviews but not always):

1. General hotel booking site entry for Marnix Hotel
2. General hotel booking site entry for Marnix Hotel
3. General hotel booking site entry for Marnix Hotel
4. General hotel booking site entry for Marnix Hotel
5. General hotel booking site entry for Marnix Hotel
6. General hotel booking site entry for Marnix Hotel
7. General hotel booking site entry for Marnix Hotel
8. General hotel booking site entry for Marnix Hotel
9. General hotel booking site entry for Marnix Hotel
10. General hotel booking site entry for Marnix Hotel

People usually don’t look beyond the top 10 results, but the actual hotel’s official homepage,, is not to be found in results 11 to 20 either. What you can do now is search Google for [] or [inurl:marnixhotel] and similar queries, but this is only little better than going straight to the browser address bar to try your luck. (You may also head over to Google Maps.) In other words, the Google result when querying for a specific hotel is not diverse enough to be as useful as possible, because Google favors larger networks.

In this specific case, one may think it would be easy to come up with a working approach adjustment; say, why doesn’t Google favor sites which have the words “marnix” and “hotel” right in the second-level domain? On the other hand, what may work for this query may not work for another query, and Google normally favors scalable solution. (And in fact, when searching for the more specific [marnix hotel homepage] – a stronger indicator you’re after the official thing – the domain will be shown, but it’s apparently a domain reserved by the network.)

For diversity, Google may analyze links and then look at larger site clusters which are still distant to each other (while trying to penalize clusters which are a little too large and interconnected, perhaps, as that may be a spam farm)... but it may be very hard to identify a fitting but not well-backlinked site. I know (or rather, I guess) that is the official site, but I figured this out by looking at the domain name, the site’s layout, the introductory text and imagery and so on.

So the problem remains: heavily optimized, backlinked and well-networked sites are doing better than non-optimized island pages. Most hotel or small business owners may not have the same resources to invest in optimization, or they may simply not realize there’s need for optimization, or in the case of hotel booking networks, they may be perfectly happy if those other sites handle the booking. Only the Google searcher misses out on the site they were potentially looking for due to a lack of result diversity.*

*However, while diverse results are good, this doesn’t mean ambiguous search queries should always result in an equal share of all different types of results. And sometimes, cases of ambuigity could be construed which may not exist as such in real life due to actual, common search patterns. Take the query [apple], for instance, which could mean you’re looking for information about apple the fruit, or Apple the company. However, the Google top 10 result at this time is filled with Mac-related information exclusively; only in result 11 (which, again, most people won’t see, and rather adjust their search query to search again) will there be a pointer to Wikipedia’s entry on the fruit. While this is a bit of lack of diversity, and one might think Apple-the-fruit may fare well in something like position 9 or 10, I would also think that many people looking for apple-the-fruit would enter more specific queries than that, such as [apple recipes] or [apple images], so the result is not all that bad.


Blog  |  Forum     more >> Archive | Feed | Google's blogs | About


This site unofficially covers Google™ and more with some rights reserved. Join our forum!