Google Blogoscoped

Forum

Notes from Peter Norvig's talk at the University of Colorado at Boulder on Oct 3, '06

Brian M. [PersonRank 10]

Wednesday, October 4, 2006
18 years ago9,079 views

Director of Google Research
http://en.wikipedia.org/wiki/Peter_Norvig

Quote: "It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts. " – Sherlock Holmes

More data vs. better algorithms. AI/ML research has typically focused on optimizing algorithms on small data sets. They should instead go collect more data, run the algorithms on them again, and see which ones come out on top. With enough data, many algorithms (think NN, SVM etc...) become fairly equal. References Banko and Brill (2001) of Microsoft Research. See http://research.microsoft.com/users/mbanko/Trec2001QA.pdf

Linux humor: Enter "cat" and "dog" into Google Sets. Now enter "cat" and "more". http://labs.google.com/sets

Uses Windows XP on an IBM Thinkpad.

Google Translate (GT) is based on the principles of Statistical Machine Translation. See Peter F. Brown, John Cocke, Stephen A. Della Pietra, Vincent J. Della Pietra, Frederick Jelinek, John D. Lafferty, Robert L. Mercer, and Paul S. Roossin. A statistical approach to machine translation. Computational Linguistics, 16(2):79–85, June 1990.

GT now gets 55% accuracy on English to Arabic. Human agreement on human translations is 60%. After this point they have no standard by which to measure their progress!

GT got bett resu by prun all engl word to 4 lett. Also saved a lot of spac.

http://google.com/gadgetawards

Wolfgang Pauli [PersonRank 0]

18 years ago #

That was a strange talk. The take home msg was that current algorithms are better if you train them on more data. Hello? What is the point of that.

GT, seems to do OK, but how exciting is it to use a an algorithm that does not build on the structure of language. It would be so much better to train a model to speak two languages and then let it translate. Of course it won't work right away, but that would be called nice science work.

Concerning the ceiling effect, reading the literature about this should tell them that a common criterion is to translate something forward and backward between two languages until the performance goes down.

Poor Peter Norvig, he has to come to Boulder, give a boring talk to hire people that ask questions like: "Is Google into hardware?"

Philipp Lenssen [PersonRank 10]

18 years ago #

> The take home msg was that current algorithms are
> better if you train them on more data. Hello? What is the
> point of that.

I suppose it's a matter of focus – focus on getting more data than worrying about finding the perfect algorithm.

> GT, seems to do OK, but how exciting is it to use a
> an algorithm that does not build on the structure of language.

I think Google is very pragmatic – what works works, and the user's happy.

> Concerning the ceiling effect, reading the literature
> about this should tell them that a common criterion is
> to translate something forward and backward between
> two languages until the performance goes down.

I wonder how well human translators fare in this test.

Forum home

Advertisement

 
Blog  |  Forum     more >> Archive | Feed | Google's blogs | About
Advertisement

 

This site unofficially covers Google™ and more with some rights reserved. Join our forum!