Google Blogoscoped

Forum

Googleshare Translations  (View post)

Philipp Lenssen [PersonRank 10]

Monday, February 5, 2007
17 years ago5,478 views

Edit: Some edits ~10 minutes after posting; changed Chinese to Arabic on top, added Systran caveat, cleaned up some HTML.

Brian M. [PersonRank 10]

17 years ago #

Don't be fooled by their name that they are extracting "meaning" from the text. They aren't. On top of that, the English <-> Arabic translation is more difficult than Spanish -> English. Google is probably still outsourcing that effort to SYSTRAN [2]. The only languages Google is doing in-house are Arabic, Chinese, Japanese, Korean, Russian. Those languages don't even have the same alphabet as English!

[1] http://blogoscoped.com/forum/70262.html
[2] http://en.wikipedia.org/wiki/Systran

telcogod [PersonRank 1]

17 years ago #

No need to screenscrape use the google search api!

http://code.google.com/apis/ajaxsearch/

Wouter Schut [PersonRank 10]

17 years ago #

I have used Google to learn english by using exactly the mentioned googleshare :P. To find spelling errors and to find common phrases to use.

:D :D

Dave [PersonRank 0]

17 years ago #

I'd be very surprised if google wasn't attempting something very like this themselves. It seems one of the most logical uses of their massive database, and closely related to the artificial intelligence and understanding their well-known to be researching.

Martin [PersonRank 0]

17 years ago #

I remember reading about some effort to use the text corpus of the EU administration as a rosetta stone, because all documents are translated into all official EU languages by human translators.

With that database and some algorithms, it got quite good automated translations of new text from the same domain.

Tony Ruscoe [PersonRank 10]

17 years ago #

Interesting articles (both this one and the one on Wired). As the Wired article says, statistical-based machine translation performance is a real problem. 10 seconds per word is just about bearable on a 20 word sentence but pretty much not an option on a 10,000 word white paper. Even the one-second-per-word target Meaningful Machines are trying to achieve in the next year would be painfully slow when compared to rule-based machine translation systems.

Google seems to be doing pretty well so far though. I'd be interested to know how many words per minute they can translate at peak times though. I'm guessing that one reason they're holding out on releasing some more common languages (like Spanish or French) is because they'll get more people using it, which would then cause (i) speed issues and (ii) more people spotting how poor the quality is...

Hong Xiaowan [PersonRank 10]

17 years ago #

1.Google can add translate function to Gtalk. For example, I use chinese, and my girl friend use english. She write english, when her message comes, both with english and chinese translation. So I can edit Chinese translation to be reasonable result.

2.So this time, Google Translate is my girl friend. She will give me a mass message both in english and Chinese. So I can perfect the chinese translation. And this will help my girl friend to update her Database. Just like the dream of Google Search AI level by Philipp Lenssen, my girl friend will become perfect and perfect.

3.Now Google Translate offer the result with Chinese translation and english popup windows. Just add a little change, Make Chinese can be edited for per sentence . After edited, not let the english popup window close automatically, let my hand close it and do the next sentence.

4.This function should not open for anyone. My friend should have the way to select many super boy friends. I not care. haha.

Saibot [PersonRank 1]

17 years ago #

"I find it hard to imagine that a larger corpus hurts the feasability of this"

I would think the web would be a lower quality corpus, however, considering the copious amounts of feedback, spamming, flaming, and (dare I write it) blogging containing improper spelling or grammar.

On the other hand, it would likely better represent the vernacular usage of many words not found in the various dictionaries. Thus being more likely to catch "heut" vs "heute" as well as dialectical variances.

milivella [PersonRank 10]

17 years ago #

What about word order? Different languages have different constructions...

Philipp Lenssen [PersonRank 10]

17 years ago #

Maybe create permutations of not only all possible words, but all possible word orders? Like an anagram. I'm not saying this is feasible :)

Somewhat related – turning questions into answer sentences by shuffling the word order:
http://blogoscoped.com/archive/2004_03_30_index.html

Christian Matschke [PersonRank 1]

17 years ago #

As far as German is concerned, it would be interesting to see what Google could make of this project / resource of the University of Leipzig: http://wortschatz.uni-leipzig.de/ – maybe they should collaborate?

Brian M. [PersonRank 10]

17 years ago #

If you guys are really interested in the methods that Google is using, I strongly suggest picking up the book "Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics" [1]

It is in the same series as Peter Norvig's (Director of Research at Google) classic Artificial Intelligence textbook. Norvig is listed as the editor of Martin's book, in fact. (Disclaimer: I am taking this class w/ Martin)

[1] http://www.amazon.com/Speech-Language-Processing-Introduction-Computational/dp/0130950696

Brian M. [PersonRank 10]

17 years ago #

In fact, take a look at the first customer review on Amazon for a little surprise :)

Jonathan Roberts [PersonRank 0]

17 years ago #

Fine if you want to translate into the the most commonly used phrase. But what if the text actually means something more obscure, you would want the actual meaning. To translate this you would have to have a database of actual phrase conversions.

Forum home

Advertisement

 
Blog  |  Forum     more >> Archive | Feed | Google's blogs | About
Advertisement

 

This site unofficially covers Google™ and more with some rights reserved. Join our forum!