Google's Question Answering System is bad, and being gamed
Question answering systems are very difficult (by which I mean messy) to program, and that's why Yahoo is featuring Yahoo! Answers so prominently. All the major search engines realize that the long tail of search is queries that are more than 1-2 words, and the longer the query gets, the higher the likelihood that it is a question. Yahoo doesn't want a specialized system to answer those questions, whereas Google does.
Microsoft's Q&A system simply rewrites your question into various queries and tries them out. The philosophy is that hey, what we need is a million monkeys on typewriters, and that's exactly what the Internet is.
Now, I'm not sure what Google's Q&A algorithm is like, but it clearly doesn't work. Here's a few examples of the same question from Ask, Google and Microsoft:
Query: What's the population of my Boulder, Colorado? Link: http://www.ask.com/web?q=what%27s+the+population+of+boulder%3F&qsrc=0&o=333&l=dir Answer: "Boulders population is almost exactly 100,000, of whom one-fourth are University of Colorado students. " Reference: www.mendosa.com Comment: That's a fantastic answer.
Link: http://search.live.com/results.aspx?q=what%27s+the+population+of+boulder%3F&mkt=en-us&FORM=LVSP&go.x=0&go.y=0&go=Search Answer: "Boulder, Colorado Population, total: 92,196" Reference: "2004 estimate. US Census Bureau" Comment: That is a correct answer. However, it's clear that Microsoft "hard coded" census data to population answers, as the census data is not reported in a parseable Q&A style.
Link: http://www.google.com/search?hl=en&lr=&client=firefox-a&rls=com.ubuntu%3Aen-US%3Aofficial&hs=xvs&q=what%27s+the+population+of+boulder%3F&btnG=Search Answer: "Boulder — Population: 4,417,714" Reference: stopaddiction.com/states/colorado_ drug_rehab_info~Boulder.html Comment: Uhh, Google? That's the population of Colorado.
Google is being gamed. They are clearly taking Page Rank into account in their answering system, because the site they reference, stopaddiction.com, is employing a duplicate content/linking scheme. They have a page for every city in Colorado with almost identical content.
Let me give you an idea of how badly their system performs due to being gamed (all proper names are cities in Colorado, or Colorado):
Query: What's the population of Colorado? Link: http://www.google.com/search?q=What%27s+the+population+of+Colorado%3F&ie=utf-8&oe=utf-8&rls=com.ubuntu:en-US:official&client=firefox-a Answer: "Colorado — Population: 4,301,261 ; 24th, 12/00" Reference: 50states.com/colorado.htm Comment: Correct!
Query: What's the population of <Insert name of city in Colorado here>? Link: http://www.google.com/search?q=what%27s+the+population+of+evergreen%3F&ie=utf-8&oe=utf-8&rls=com.ubuntu:en-US:official&client=firefox-a Answer: "<name of city in Colorado> — Population: 4,464,356" Reference: cocaine-addiction.ca/alabama_cocaine_ rehab/Evergreen_cocaine_addiction_treatment.htm Comment: The reference site listed is different than before, but is also gaming Google in the exact same way.
Local answers might be triggered by your locale, because I was unable to get Google to give population data for other cities.
[I unlinked some of your reference links, just to make sure there's no outgoing spam links here... -Philipp] |
Do they consciously game the Q&A? It looks spammy but is it spam directed at Google in general or at this onebox? E.g. one of the pages has a PageRank of 0 but contains this snippet at the bottom:
<<State Facts Population: 4,464,356 Law Enforcement Officers: 11,378 State Prison Population: 37,300 Probation Population: 39,697 Violent Crime Rate National Ranking: 21>> |