Google Blogoscoped

Forum

Google's New Translation  (View post)

Freiddie [PersonRank 7]

Tuesday, October 23, 2007
16 years ago21,162 views

Is that better?

translator guy [PersonRank 0]

16 years ago #

how do you know that they switched on a new system?

Philipp Lenssen [PersonRank 10]

16 years ago #

> how do you know that they switched on
> a new system?

Because the results are not the old ones anymore, e.g. a while ago, for this article http://blogoscoped.com/archive/2007-10-03-n51.html everything was still in-sync with Systran. If you check these idoms now, they return different results (differently wrong results, I might add, at least for a couple I checked).

But as I wrote, I don't know if it's using Google's in-house translation efforts now. Could also be a new third-party provider, or an update to the Systran version which Altavista doesn't yet have, or...

Martin Porcheron [PersonRank 10]

16 years ago #

Well, seeing as the "Suggest a better translation link" is available on all languages now, I presume that it is Google's new technology.

Ryan [PersonRank 0]

16 years ago #

I think the biggest error in the translation is still the fact that typing "espanol" and translating from spanish to english returns "english"

Espanol means spanish, not english.

Tony Ruscoe [PersonRank 10]

16 years ago #

Ryan, I guess that's one of the downfalls of statistical machine translation. For example, if some of the text they analyzed had links to an English version from the Spanish text and links to the Spanish version from the English text, using "espanol" and "English" as the link texts, that could explain the error.

TOMHTML [PersonRank 10]

16 years ago #

I'm writing an article about huge bugs. Translate "sarkozy, blair, chirac" from French into English...

Christian Langreiter [PersonRank 1]

16 years ago #

If it were Google's statistical translation system, I doubt some language pairs would be still marked as BETA.

Christian Langreiter [PersonRank 1]

16 years ago #

OMG. A comment editing facility would be really nice ;-)

Philipp Lenssen [PersonRank 10]

16 years ago #

Christian, the drop-down box might not have been updated yet. By the way the ones marked as BETA are/ were the officially in-house, statistical ones.

(If you have any edit, just write it in a comment and one of us will get to it)

INFORMANT [PersonRank 1]

16 years ago #

So, I've been on here before talking about Google's SMT (Statistical Machine Translation) research under their resident genius, Franz Och. In this case, German is particularly thorny as it is rumored that Google is only at a 7 to 9 n-gram model – German would require 16 to 20. For example, in German language, a verb may appear at the end of a long sentence and this destroys the current model. My gut says that Google is experimenting with an alternate approach and is incorporating some other linguistics-oriented approach, perhaps modifying with human intervention on the core German corpora.

James Xuan [PersonRank 10]

16 years ago #

[put at-character here] Tom
"Blair, blair, Moss"
lollll!

TOMHTML [PersonRank 10]

16 years ago #

"sarkozy sarkozy sarkozy" [FR]
=
"Blair defends Bush" [EN]

...

Ionut Alex. Chitu [PersonRank 10]

16 years ago #

> how do you know that they switched on
> a new system?

I realized that the dialog for suggestions appears for every language. Google uses it for feedback. Obviously, you need feedback only for things you can control, not for 3rd party systems.

Philipp Lenssen [PersonRank 10]

16 years ago #

[Edit: changed the post's title from "Google's New Translation" to "Google's New Translations".]

Philipp Lenssen [PersonRank 10]

16 years ago #

I asked Google for a statement, here's their reply: "We are now using our own machine translation technology for the language pairs on Google Translate."

Domas Mituzas [PersonRank 0]

16 years ago #

The interesting problem by this translation is that names get translated into various funny versions, I wrote about it at http://dammit.lt/2007/10/23/google-translate-glitch/ – some of results were really strange :)

Philipp Lenssen [PersonRank 10]

16 years ago #

You need to spell the names correctly though for a fair test. While [sarkozy sarkozy sarkozy] translates to [Blair defends Bush], the correct spelling [Sarkozy Sarkozy Sarkozy] translates to [Sarkozy Sarkozy Sarkozy]. I still think this is an extremely weird thing Tom hit on...

Stephen Tordoff [PersonRank 10]

16 years ago #

Weird bug, or 'easter egg'?

I'd have thought that it should be case insensative anyway

James Xuan [PersonRank 10]

16 years ago #

Yeah

Philipp Lenssen [PersonRank 10]

16 years ago #

Enter "sarkozy is chirac" (including the quotes) into the translator. Translate from French to English.

(I dropped Tom's find into Reddit, this one was discovered there: http://reddit.com/info/5yyr4/comments/)

James Xuan [PersonRank 10]

16 years ago #

[put at-character here] Phil
HAHAHA!

hebbet [PersonRank 10]

16 years ago #

Here Google must do very much. Another funny mistake is, when you translate from Englisch to German "hello Eva hermann"

Matt Cutts [PersonRank 10]

16 years ago #

I've been playing with the French translations on e.g. news on voila.fr, and it looks pretty good to me..

Leif [PersonRank 0]

16 years ago #

Yeah. Try this too.

"Tom has a ball." > English -> French > French -> German > German -> English > "Tom has a rifle bullet."

Somewhere the ball turned small and hard.

Veky [PersonRank 10]

16 years ago #

This is ridiculous. Say "sarkozy sarkozy sarkozy" to anyone without a context, and he'll probably think you insane. It's not a sentence. It's not even a sentence fragment (as opposed to "Buffalo buffalo buffalo";-) in English, but notice the capital "B"). Why would somebody expect a reasonable translation of it? And what the reasonable translation would be, anyway?

Google Translation has many problems, of course (and it will have them forever, or at least until the moment our sentences will be much more uniform and predictive, when we'll have bigger problems;). But when the translation ("Blair defends Bush") makes more sense than the original text, I'd start to think about what we are trying to accomplish.

Tony [PersonRank 0]

16 years ago #

tried it ..
sarkozy => Blair
sarkozy sarkozy => Blair defends
sarkozy sarkozy sarkozy => Blair defends Bush
sarkozy sarkozy sarkozy sarkozy => Blair defends Bush defends
sarkozy sarkozy sarkozy sarkozy sarkozy => Blair defends Bush defends Bush
sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy => Blair defends Bush Bush Bush Bush
sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy => Blair defends Bush Bush Bush Bush Bush Bush

Tony [PersonRank 0]

16 years ago #

sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy sarkozy =>

Blair defends Bush Bush Bush Bush Bush Bush Bush Bush Bush Bush Bush Bush Bush Bush Bush Bush Bush Bush Bush Bush Bush Bush Bush Bush Bush Bush Bush Bush Bush Bush Bush Bush Bush Bush Bush Bush Bush Bush Bush Bush Bush Bush Bush Bush Bush Bush Bush Bush blair defends Bush Bush Bush Bush Bush Bush Bush Bush Bush Bush Bush Bush Bush Bush Bush Bush Bush Bush Bush Bush

Aaron Bassett [PersonRank 1]

16 years ago #

"sarkozy sarkozy sarkozy's merde" is slightly better ;)

Philipp Lenssen [PersonRank 10]

16 years ago #

Tom found another weird one: "bush irak israel" (French -> English)

Reto Meier [PersonRank 10]

16 years ago #

Veky makes a good point. We're testing a machine learning engine by typing in nonsense. The translator is already smart enough to recognise pronouns when they are correctly capitised (Sarkozy Sarkozy Sarkozy works as expected).

Obviously Google's translator is influenced by context for word selection in translation (important as very often context can change a words meaning completely – "According to this bill of rights at my right, I am right!"), for nonsensically arranged words context delivers a more obvious fail. A good translation should try to get a cross the meaning of the translated text rather than a word-for-word substitution. What does "sarkozy sarkozy sarkozy" mean?

Reto Meier [PersonRank 10]

16 years ago #

*For the record:
"According to this bill of rights at my right, I am right"

English --> French --> English

Systran: According to the declaration of the rights on my line, I am right.
Google: According to the bill of rights to my right, I am right.

English --> German --> English

Systran: According to this condition at my right, I am right
Google: According to the law, the rights at my right, I am right

Ionut Alex. Chitu [PersonRank 10]

16 years ago #

You should never do that with automatic translations. Never use as an input the output of an automatic translation.

Reto Meier [PersonRank 10]

16 years ago #

Indeed you shouldn't. At least not with current implementations of automatic translators.

But if you can create a translator which effectively translates meaning based on context, rather than words-for-words, this should work better. Which this new version of Google translator seems to.

Reto Meier [PersonRank 10]

16 years ago #

...interesting. The Google translator does markedly better at round trips.

"Now is the time for all good men to come to the aid of their country."

ST: Is now the hour for all the good men to come using their country.
G: Now is the time for all good men to come to the aid of their country.

Eng-French-German-Eng

ST: Is now the hour for all good men to come by means of its country
G: Now is the time for all good people come to their country

I wonder if this is influenced by the commonness of the initial term in English (it's typical 'filler text'). More experiments I think...

Arthur [PersonRank 0]

16 years ago #

Also nice:
"bush irak israel" (French -> English)
Bush allies with israel

"bush sarkozy israel" (French -> English):
Bush defends israel

Obviously, France is for Bush more a partner when it comes to defense, whereas Great-Britain is a real ally :-)

Arthur [PersonRank 0]

16 years ago #

Eeehm Great-Britain -> Irak ... makes it even more weird..

Reto Meier [PersonRank 10]

16 years ago #

Arthur: What does the sentence 'bush irak israel' *mean* in French? My French is rusty but as far as I can see it means nothing at all. If the translation came back as 'dog eats cat' it would make just as much sense. In fact, the translation makes more sense than the initial 'sentence'.

The reason we got words like 'allies' and 'defends' just suggests that these words often appear together in the source material. Given Google's most likely use for contextual machine translation (News!) it would be no surprise to learn that's the material used to develop the algorithms.

Marc-O [PersonRank 1]

16 years ago #

I agree with you Reto

Most machine learning algorithms, especially in beta, can be fooled easily when faced with intentionnaly noisy data. The google translator seemingly expect phrases, and probably wasn't trained to react to random sequence of words. In fact, who would really use a translator for such a use? If you want to input random words (not in a sentense), putting them on separated lines WILL work correctly.

Also, automatic translators should NEVER be trusted completely, and especially when it comes to political texts. Such systems are not foul-proof, and are known to sometimes forget a negation, which will likely cause confision. Those systems are also bad with people names, many of them having a real-world meaning other than a person. Google translator at least seem to see that capitalized names are names, while if the first letter is not a capital, sees it as a common noun.

Also, since these systems are trained using already translated texts, bad translations can cause from noise in the source data, i.e. the original texts in french and english not being correctly translated and cut phrases by phrases by humans. I guess the "suggest correction" button will help get things better.

I guess the whole problem comes from people expecting the system to be robust against bad data. I guess google will have to adapt somewhat to realize that bush is almost always equal to Bush, irak to to Irak, and israel to Israel (in the user's head at least).

TOMHTML [PersonRank 10]

16 years ago #

Matt, the translator is good in average, but not all the time. And sometime it's really strange, and a non-French reader can't spot it.

Nicolas Sarkozy biography on French Wikipedia :
"Sa famille possédait des terres et un petit château dans le village d'Alattyán (près de Szolnok)"

*MY* translation in English:
"His family owned land and a small castle in the village named Alattyán (near Szolnok)" [in Hungary].

Google's translation:
"His family owned land and a small castle in the village of Chirac (near Szolnok)".

Do you spot the bug? :-/

Arthur [PersonRank 0]

16 years ago #

Yes, it was a joke :-), but considering that these machine translations often are trained by co-occurrence of words, it might indicate that bush, allies and israel co-occur more with French texts where also Irak is mentioned, whereas bush, defends, and israel, co-occur more with French texts mentioning sarkozy..

So, although, indeed you may not conclude anything from it, the different translation does say something about the underlying corpus.

Thomas David Baker [PersonRank 0]

16 years ago #

The human translation turns "Pong" into "Point" but should not.

Philipp Lenssen [PersonRank 10]

16 years ago #

I wonder if some of the "hiccups" of Google's translation can be explained by the fact that some fictional works are translating not only words, but also the settings of e.g. the novel. In a novel, if the hero in the US original might be living in New York and getting into a fight with Ronald Reagan during the novel's climax (completely random example), maybe in the German translation the protagonist lives in Berlin and is opposing Helmut Kohl. These types of translations are "artistic freedom" I suppose. I'm not sure every human translator considers this to be correct in spirit (?), but I definitely saw it happen. Then Google, when comparing corpus A & B, may determine after a while that the string "Ronald Reagan" is translated to "Helmut Kohl"... though only if they have a low threshold of matches and don't have a big corpus yet containing these words (because if they had, more translations would turn "Ronald Reagan" into "Ronald Reagan" even in the translation)...

Tony Ruscoe [PersonRank 10]

16 years ago #

Philipp, you could be right there. I read something once that said Google was using translations used by the UN. If each translation was specific to the country the language was intended for, I guess things like country and language names could well have got mixed up.

Philipp Lenssen [PersonRank 10]

16 years ago #

RightReading sent in another translation comparison:

http://www.rightreading.com/blog/2007/10/28/google-translate-no-longer-using-systran-software-goes-head-to-head-with-yahoos-babelfish/

TOMHTML [PersonRank 10]

16 years ago #

At first sight, the "sarkozy" problem has been fixed.

Kaitch [PersonRank 1]

16 years ago #

Seems so, "bush irak israel" still work.

People suggested better translation maybe.

Forum home

Advertisement

 
Blog  |  Forum     more >> Archive | Feed | Google's blogs | About
Advertisement

 

This site unofficially covers Google™ and more with some rights reserved. Join our forum!