I'd really like a post from Philipp about the different kind of searches we all do with the same instrument, Google web search. E.g. I search sometime for a name (to know about that person), sometime for a quotation (to know where it is from), etc.
Let'see if Philipp will satisfy my request... ;) |
I think it's an interesting question.
Sidenote: the type of search engine we're searching with changes what kind of search queries we enter. Right now we're adapting our search queries to what works in Google, Yahoo and others. Asking complete questions doesn't really work often, so we adapt and give hints, circling around the location where we suspect our answer to be. For example, to find the opening hours of my local bank, I cannot enter "opening hours MyBankName stuttgart". To get to it, I need to enter "MyBankName stuttgart" and then continue the search on their page, by manually clicking around. Right now, being too precise is punished by Google, because there's not enough intelligence on the search engine's side, and there's not enough data on the web. I think our search queries are often more ambiguous then how we'd phrase them if we're asking a friend for advice. A friend, I don't ask "MyBankName stuttgart". A friend I'd ask: "What are the opening hours for the MyBankName Stuttgart?" So I think this sidenote is important when we want to look at search results, because we can't think of these queries as necessarily "natural" or the kind of queries "a search engine in general must tackle".
Now, here's an interesting bit. This is straight from the leaked "General Guidelines on Random Query-Evaluation" by Google, from 2003. [PDF] http://www.searchbistro.com/guide.pdf
<<The Query Types
While there is no simple way to categorize all searches into a neatly organized system, three major categories have been used by analysts of web search to draw a distinction between navigational queries, informational queries, and transactional queries. This classification, as it turns out, allows for some useful generalizations in the context of query-result evaluation.
A *navigational* query is one that normally has only one satisfactory result: the user types in the name of an entity ("United Airlines") and expects to be taken to the homepage of that entity.
An *informational* query can have many or few appropriate results, with varying degrees of relevance/informativeness/utility and varying degrees of authority. The user types in a topic ("renaissance paintings", "aging disease"), sometimes in the form of an actual question ("What is a quark?", "How do I...?"), and expects to be provided with information on this topic.
A *transactional* query as well can have many or few appropriate results, of varying quality. However, in this case the user is not requesting information – or at least not only information – but instead has the primary goal of carrying out a transaction. Typically, the transaction consists of the acquisition – for money or free – of a product or service. Some transactions can be fully carried out on the web (think furniture clipart download), some come to fruition offline (think furniture to put in a house).
Again, not every query can be clearly classified. Since products include information products, the line between informational and transactional queries is sometimes hard to draw. Similarly, because the ulterior motive for a navigational query often is to locate a site for potential transactions, there is a grey zone between navigational and transactional queries. To the extent the classification is helpful, use it, but do not attempt to fit any query that comes your way into one of the three boxes: always trying to decide in favor of one or another will only lead to frustration. It may be more helpful to think of different aspects of a query: for instance, the query [thomas the tank engine] can have (a) a navigational aspect – take me to Thomas' homepage, (b) an informational aspect – tell me the history of Thomas creation, and finally, (c) a transactional aspect – I want to buy a book or a toy engine from the Thomas collection.>>
I'm sure there are many other ways to categorize queries. I wonder what categorization you would consider?
Whatever form of categorization you come up with, one of the best places to start with finding some real live search queries to test on is either your Google search history, if enabled (please feel free to post samples here :)), or the AOL data leak (http://blogoscoped.com/archive/2006-08-07-n22.html). I'm keeping a copy of all the logs here with a small processing script on top. Here are some samples from what John Battelle calls the "database of intentions" (http://battellemedia.com/archives/000063.php):
---------------------- the childs wonderland company the child's wonderland company grand rapids michigan konig wheels uslandrecords.com google wnmu homepage fuel additives check engine light bare minerals make up pergola pergola house entrance once upon a child boppy covers lily pads breast dog eats uncooked pasta inducing dog vomiting jesse mccartney jessemccartney party city avis mapquest kids cake designs butterfly party peep hole peephole poison control hydocortisone lyrics to dreamer bethany dillon 100.3 richmond jimmy carter's election jones beach concerts 2 year old and lazy eye turns outwards ap satistics rishi rich jay sean ms. new booty hazy visions spanish accent indian jokes pinnacle's armor hips don't lie wayne wonder music and emotional response tajmahal restaurant brooklyn ny x-clusity.com careers.com coon rapids herald hollywood babe teen miss conejo valley preschool worksheets pageant hair extentions how tos submarine swimwear kids www.google.com biology sols which is the best air cleaner staucks starbucks dishwashers soap stone stoves oxtail soup ---------------------- |
Great (as usual). I think *you* could get something out of these data, and add other categories to the three mentioned.
Sometimes, however, only who type a query knows his own intentions: e.g. I often type something like "larry page" "sergey brin" "eric schmidt" not because I want to know about them or about what they have in common, but to know if there is someone that is part of the same category (a category that I don't know how to name, or that can be named in many different ways, or in which many people wouldn't include the same items as me*).
Naturally, the next question will be: is really one search engine and one algorithm the best for so many types of search? Google is going in rthe direction of a unification: see http://googlesystem.blogspot.com/2006/11/google-universal-search.html
*E.g. I wouldn't type "best rock albums", but "fresh fruit for rotten vegetables" "come on pilgrim" |
Some research starting from Philipp's citations and I discovered that...
.. the distinction among informational, navigational and transactional was created by Broder from IBM: http://www.acm.org/sigs/sigir/forum/F2002/broder.pdf
...Broder's taxonomy was develeped by Rose and Levinson from Yahoo: http://www2004.org/proceedings/docs/1p13.pdf
...like everything else, the more you read, the more you haven't yet read. |