Google Blogoscoped

Forum

Language, Google and Translation Difficulties  (View post)

Tony Ruscoe [PersonRank 10]

Thursday, October 25, 2007
16 years ago4,449 views

Well, I think that's a good effort. It's probably on a similar level to some poor human translations I've seen (for example, some non-English bloggers who post in English) and I definitely got the gist of what was being said.

But would I have still got the gist had this gone through Systran or another machine translation service? I'd be interested to download the original German text to compare results.

Reto Meier [PersonRank 10]

16 years ago #

Interesting experiment Philipp. I agree with Tony, I got the gist of most of it with only some of the specific examples failing.

What's interesting to me is that much of what's written online is grammatically atrocious; I imagine this system works better on text written 'correctly', so would probably return better results against sources like newspapers rather than blog entries.

Tony Ruscoe [PersonRank 10]

16 years ago #

Reto, you're absolutely correct. One of the biggest problems for machine translation is that it's the most cost-effective solution for text that has a limited shelf life – since it's not really worth paying a professional translator to translate something which will only be around for a short time. This includes news articles, blog posts, emails, SMS, IM, chat, etc.

All of those (perhaps apart from news articles and some blogs) are likely to have the poor grammatical structure and include idioms and abbreviations, which means any rule-based system may really struggle.

Statistical machine translation would be much better for this purpose, but the problem is obtaining good quality aligned "training" texts in multiple languages that have all these quirks.

milivella [PersonRank 10]

16 years ago #

1. Interesting! And Tony Ruscoe is right: it would be interesting also to read the translation of the same text by Systran.

2. My impression too is: the overall sense is clear, even if almost any individual phrase is not correct. Something like the meme "Aoccdrnig to a rscheearch at Cmabrigde Uinervtisy, it deosn't mttaer..."
http://www.mrc-cbu.cam.ac.uk/~mattd/Cmabrigde/

3. (The thought about cultural misunderstandings) As usual, you are a step ahead of everybody, Philipp!

4. Could you write something specific about Google translation method? Wikipedia
http://en.wikipedia.org/wiki/Google_Translate
speaks of "find[ing] patterns it then uses to build rules for translating between those languages." I previously thought it was simply something like "take the first word, find the longest n-gram starting with that word in the corpus, replace it with the translation, go to the next word", but I also thought I was missing something.

milivella

David Mulder [PersonRank 10]

16 years ago #

Although I understood most of it I guess that it would have been harder for me if I wouldn't know a bit german.

Matt Cutts [PersonRank 10]

16 years ago #

I understood quite a bit of it. My high school classes were French, so that's what I looked at, and the results looked pretty good to me. I'd agree with your point that German is one of the more challenging languages.

Yaught [PersonRank 1]

16 years ago #

The question is: Who owns the copyright on this text dump? Does Philipp own it, because he wrote it? Or does Google own it, because they produced it?

Google won't let you reproduce their ranked list ordering, from a query that you wrote, right? I.e., when Google "translates" your query into a list of 10 web pages, it still claims ownership over that "translation", right?

So that must mean Google is still claiming ownership over the above translation, too. Philipp, did you secure permission from Google to republish that work in its entirety?

Yaught [PersonRank 1]

16 years ago #

[put at-character here] milivella:

Regarding your 4th question: I think the closest thing to how Google translate works is hinted at in this paper:

http://www.fjoch.com/smorgasbord.pdf

INFORMANT [PersonRank 1]

16 years ago #

Actually, if you *really* want to know where Google is heading with SMT, then follow the trail introduced by this clue (hint: this is Franz Och's closest friend in the world of SMT and a long time friendly rival) – http://www.isi.edu/~knight/ (find his firm, find his colleagues, find his clients and so on)

Philipp Lenssen [PersonRank 10]

16 years ago #

> So that must mean Google is still claiming ownership
> over the above translation, too. Philipp, did you secure
> permission from Google to republish that work in its entirety?

I'm no copyright lawyer, but the reverse may be true as well – that Google doesn't have the right to "republish" a translation of my source text as I didn't give it permission (OK, let's assume for the sake of argument that it wasn't me who pasted in the text!). For instance, by common copyright laws AFAIK you can't take Book X which is only available in English and create a local translation in your native tongue without securing the right with the author/ publisher of Book X first. Then again, Google doesn't make the republication available anywhere outside my session, and it certainly doesn't make it indexable in search engines. (And certainly I think it's a great feature and should be allowed etc.)

I'm not sure if it's related, but according to certain copyright laws (US ones? global ones?) you also can't claim ownership of e.g. a straight photo of some piece of art if you don't add any artistic value. You can't print & sell a poster of a photo that shows nothing but a painting (if the painting didn't yet fall into the public domain), because you didn't add anything to it, it's just basically a reproduction of the source work. Maybe the case with an automated translation is somewhat similar?

Reto Meier [PersonRank 10]

16 years ago #

As far as the reverse goes:

"11.1 You retain copyright and any other rights you already hold in Content which you submit, [...] on the Services. By submitting, posting or displaying the content you give Google a perpetual, irrevocable, worldwide, royalty-free, and non-exclusive licence to [...] translate, publish, publicly perform, publicly display and distribute any Content which you submit, post or display on or through, the Services. [..]"

So by submitting the text you give Google the right to translate it and publish it without giving up your own rights.

Philipp Lenssen [PersonRank 10]

16 years ago #

> So by submitting the text you give Google the right to
> translate it and publish it without giving up your own rights.

Reto, but as I said, for the sake of argument I wanted to assume that not me (the copyright holder) submitted the text into the translator, but someone else... someone who doesn't own the copyright in the first place! Hence, I would think no outside contract can override my copyright (if I'm the content owner).

milivella [PersonRank 10]

16 years ago #

Thank you, Yaught :)

Lara [PersonRank 0]

16 years ago #

ohmegod. I am not sure I understand the English text, but I am positive that i can reconstruct the original German....

to me, this does not look a whole lot better than what babelfish or altavista have had available for years?

I guess, for the time being I continue tri-lingual blogging... (e/d/f)

MichaelLJ [PersonRank 0]

16 years ago #

Hat schon jemand gesagt, dass es "aufzäumen" heißt, und nicht "aufzäunen"

Philipp Lenssen [PersonRank 10]

16 years ago #

Michael, you are right, my error. On the other hand, Google also doesn't translate "aufzäumen" :)

Forum home

Advertisement

 
Blog  |  Forum     more >> Archive | Feed | Google's blogs | About
Advertisement

 

This site unofficially covers Google™ and more with some rights reserved. Join our forum!