Google Blogoscoped

Forum

Google's Question Answering System being gamed

Brian M. [PersonRank 10]

Tuesday, December 12, 2006
17 years ago2,907 views

Google's Question Answering System is bad, and being gamed

Question answering systems are very difficult (by which I mean messy) to program, and that's why Yahoo is featuring Yahoo! Answers so prominently. All the major search engines realize that the long tail of search is queries that are more than 1-2 words, and the longer the query gets, the higher the likelihood that it is a question. Yahoo doesn't want a specialized system to answer those questions, whereas Google does.

Microsoft's Q&A system simply rewrites your question into various queries and tries them out. The philosophy is that hey, what we need is a million monkeys on typewriters, and that's exactly what the Internet is.

Now, I'm not sure what Google's Q&A algorithm is like, but it clearly doesn't work. Here's a few examples of the same question from Ask, Google and Microsoft:

Query: What's the population of my Boulder, Colorado?
Link: http://www.ask.com/web?q=what%27s+the+population+of+boulder%3F&qsrc=0&o=333&l=dir
Answer: "Boulders population is almost exactly 100,000, of whom one-fourth are University of Colorado students. "
Reference: www.mendosa.com
Comment: That's a fantastic answer.

Link: http://search.live.com/results.aspx?q=what%27s+the+population+of+boulder%3F&mkt=en-us&FORM=LVSP&go.x=0&go.y=0&go=Search
Answer: "Boulder, Colorado Population, total: 92,196"
Reference: "2004 estimate. US Census Bureau"
Comment: That is a correct answer. However, it's clear that Microsoft "hard coded" census data to population answers, as the census data is not reported in a parseable Q&A style.

Link: http://www.google.com/search?hl=en&lr=&client=firefox-a&rls=com.ubuntu%3Aen-US%3Aofficial&hs=xvs&q=what%27s+the+population+of+boulder%3F&btnG=Search
Answer: "Boulder — Population: 4,417,714"
Reference: stopaddiction.com/states/colorado_
drug_rehab_info~Boulder.html
Comment: Uhh, Google? That's the population of Colorado.

Google is being gamed. They are clearly taking Page Rank into account in their answering system, because the site they reference, stopaddiction.com, is employing a duplicate content/linking scheme. They have a page for every city in Colorado with almost identical content.

Let me give you an idea of how badly their system performs due to being gamed (all proper names are cities in Colorado, or Colorado):

Query: What's the population of Colorado?
Link: http://www.google.com/search?q=What%27s+the+population+of+Colorado%3F&ie=utf-8&oe=utf-8&rls=com.ubuntu:en-US:official&client=firefox-a
Answer: "Colorado — Population: 4,301,261 ; 24th, 12/00"
Reference: 50states.com/colorado.htm
Comment: Correct!

Query: What's the population of <Insert name of city in Colorado here>?
Link: http://www.google.com/search?q=what%27s+the+population+of+evergreen%3F&ie=utf-8&oe=utf-8&rls=com.ubuntu:en-US:official&client=firefox-a
Answer: "<name of city in Colorado> — Population: 4,464,356"
Reference: cocaine-addiction.ca/alabama_cocaine_
rehab/Evergreen_cocaine_addiction_treatment.htm
Comment: The reference site listed is different than before, but is also gaming Google in the exact same way.

Local answers might be triggered by your locale, because I was unable to get Google to give population data for other cities.

[I unlinked some of your reference links, just to make sure there's no outgoing spam links here... -Philipp]

Philipp Lenssen [PersonRank 10]

17 years ago #

Do they consciously game the Q&A? It looks spammy but is it spam directed at Google in general or at this onebox? E.g. one of the pages has a PageRank of 0 but contains this snippet at the bottom:

<<State Facts
   Population: 4,464,356
   Law Enforcement Officers: 11,378
   State Prison Population: 37,300
   Probation Population: 39,697
   Violent Crime Rate
   National Ranking: 21>>

Forum home

Advertisement

 
Blog  |  Forum     more >> Archive | Feed | Google's blogs | About
Advertisement

 

This site unofficially covers Google™ and more with some rights reserved. Join our forum!