Google Blogoscoped

Forum

Notes from Peter Norvig's talk at the University of Colorado at Boulder on Oct 3, '06

Brian Mingus [PersonRank 10]

Wednesday, October 4, 2006
10 years ago6,301 views

Director of Google Research
en.wikipedia.org/wiki/Peter_No ...

Quote: "It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts. " – Sherlock Holmes

More data vs. better algorithms. AI/ML research has typically focused on optimizing algorithms on small data sets. They should instead go collect more data, run the algorithms on them again, and see which ones come out on top. With enough data, many algorithms (think NN, SVM etc...) become fairly equal. References Banko and Brill (2001) of Microsoft Research. See

+ Show PDF



Linux humor: Enter "cat" and "dog" into Google Sets. Now enter "cat" and "more". labs.google.com/sets

Uses Windows XP on an IBM Thinkpad.

Google Translate (GT) is based on the principles of Statistical Machine Translation. See Peter F. Brown, John Cocke, Stephen A. Della Pietra, Vincent J. Della Pietra, Frederick Jelinek, John D. Lafferty, Robert L. Mercer, and Paul S. Roossin. A statistical approach to machine translation. Computational Linguistics, 16(2):79–85, June 1990.

GT now gets 55% accuracy on English to Arabic. Human agreement on human translations is 60%. After this point they have no standard by which to measure their progress!

GT got bett resu by prun all engl word to 4 lett. Also saved a lot of spac.

google.com/gadgetawards

Wolfgang Pauli [PersonRank 0]

10 years ago #

That was a strange talk. The take home msg was that current algorithms are better if you train them on more data. Hello? What is the point of that.

GT, seems to do OK, but how exciting is it to use a an algorithm that does not build on the structure of language. It would be so much better to train a model to speak two languages and then let it translate. Of course it won't work right away, but that would be called nice science work.

Concerning the ceiling effect, reading the literature about this should tell them that a common criterion is to translate something forward and backward between two languages until the performance goes down.

Poor Peter Norvig, he has to come to Boulder, give a boring talk to hire people that ask questions like: "Is Google into hardware?"

Philipp Lenssen [PersonRank 10]

10 years ago #

> The take home msg was that current algorithms are
> better if you train them on more data. Hello? What is the
> point of that.

I suppose it's a matter of focus – focus on getting more data than worrying about finding the perfect algorithm.

> GT, seems to do OK, but how exciting is it to use a
> an algorithm that does not build on the structure of language.

I think Google is very pragmatic – what works works, and the user's happy.

> Concerning the ceiling effect, reading the literature
> about this should tell them that a common criterion is
> to translate something forward and backward between
> two languages until the performance goes down.

I wonder how well human translators fare in this test.

This thread is locked as it's old... but you can create a new thread in the forum. 

Forum home

Advertisement

 
Blog  |  Forum     more >> Archive | Feed | Google's blogs | About
Advertisement

 

This site unofficially covers Google™ and more with some rights reserved. Join our forum!