Google Blogoscoped

Forum

Google Translator: The Universal Language  (View post)

Bil Munsil [PersonRank 0]

Sunday, May 22, 2005
12 years ago

Esperanto is alive and well, thank you very much, and is in daily use somewhere around the world.

It is spoken in over 120 countries and over 80 contries have national, regional or local Esperanto groups.

Vilchjo de Mesao Arizono, Usono

e3ashig [PersonRank 1]

12 years ago #

wow that is amazing in the extreme. Out of all googlelabs products – I love this the most. I am a native arabic speaker and have always found the translation services online appauling. This solves the problem AI people always faced with language – I truly am impressed!

Way to go google. Do no evil.

Fajro [PersonRank 1]

12 years ago #

Saluton al Cxiuj!!!

Cxu gxi povas traduki Esperanto <=> Angla ??????

Can it translate Esperanto <=> English ??????

Google in Esperanto:
google.com/intl/eo/

:-D

----------------------------
www.google-watch.org
----------------------------

James Kilfiger [PersonRank 0]

12 years ago #

Hmm, we shall see. Translation isn't easy and google aren't the first to try their hand at it. I'll wait to see if google can even comprehsibly translate a variety of languages and a variety of registers, and not just formal English

'cause if it aint good at doing slang and that like. . . .

Brodie Field [PersonRank 0]

12 years ago #

Statistics based translation has been tried before. It works okay for some things and not so well for others. It isn't a panacea. Additionally, no translation software can deal satisfactorily with badly formed, grammatically incorrect or poorly spelled input. Garbage in, garbage out.

In this example, it appears to me that the main problem is a deficient dictionary. If White House and Bin Laden where included in the dictionary for the first translation engine, I'm sure that the results would be better.

Machine translation will not replace human translation for the forseeable future. Language is too complex. It is easy enough to get a computer to translate simple sentences like "The cat sat on the mat.", but once you start using real text with all of the complexities, ambiguities and implied world-knowledge that it contains the accuracy drops right off.

Current MT should be considered a means of understanding the gist of what a document is about. You can then decide whether to battle on in a desire to save money and try to understand the meaning of the document from the MT output, or pony up the cash to get it translated properly.

Philipp Lenssen [PersonRank 10]

12 years ago #

Bil, I never said Esperanto is dead. But you have to agree it's not a unviversal language, especially not online. I don't understand Esperanto and my favorite news sources don't offer it either way. Though I don't have any statistics, I *suspect* English is much more popular in terms of how many people understand it. And yet, my point was that a true universal language would be your *native* language with the Google Translator.

Brodie, I agree it won't be a perfect translation. I was more thinking along the lines of... it will work well enough so you can read pages without being confused. They may sound a little skewed, but not so much as to render the content meaningless – which current machine translations found online often do.

Philipp Lenssen [PersonRank 10]

12 years ago #

A second thread on this topic is here:
blogoscoped.com/forum/8268.htm ...

Elizabeth Stanley [PersonRank 0]

12 years ago #

Dear Philipp,

Your English is really, really, good – but as a native speaker, reading your article, I can immediately tell it's not your mother tongue. I think anything that fosters international communication and understanding is good. However, I'm not happy about the use of any national language as an international one because of all the cultural imperialism that goes with it and because it gives an unfair advantage to native speakers.
I'm not at all convinced English is popular with the people who have to learn it, any more than Russian was popular across the USSR.
Machine translation is attractive but dependent on the hardware – all I need to communicate in Esperanto is my own brain and voice. However, the Net is a great boon to Esperanto speakers who can now email each other or visit chatrooms.
You really must visit www.esperanto.net and read all about it. The human brain is still ahead of the computer. Have fun!

Philipp Lenssen [PersonRank 10]

12 years ago #

Elizabeth, it would be nice if you can point out my English errors so I can work on it.

As for any national language becoming the international one, well, English took this place naturally it seems without anyone planning it. As for the Google Translator, it would be able to make *your* language – whatever it may be – the "international" one.

mj [PersonRank 1]

12 years ago #

Elizabeth – Huh. I am surprised. I read this blog for quite a while before I noticed Philipp was not in an English-speaking country.
I am curious what made you realize he was not a native speaker.

Patrickas [PersonRank 1]

12 years ago #

I was thinking this is such an advanced technology it seems indistinguishable from magic (or a rigged demo!)

Then I noticed that the arabic original text actually says Bin La instead of Bin Laden...

So now I am really wondering if it was a just a rigged demo! :-)

Or maybe the original quote was just truncated by mistake.

In all cases what's really amazing about this method for translation is that you just need to keep feeding it new translated data in order to keep enhancing the results!
I think that the problems Brodie Field raises can be solved on their own when there is enough data in the system.

Philipp Lenssen [PersonRank 10]

12 years ago #

Patrickas, what would "Bin La" mean?

Patrickas [PersonRank 1]

12 years ago #

Bin La does not mean anything in that context. It can only be Bin Laden missing the last two letters. But that can only be derived from the context.

Without context the same words can mean:
"Coffee No"

Any human doing the translation would have guessed what the word is without problem, but a machine doing the translation would probably never make such a guess unless:
- it was trained with truncated arabic words (highly unlikely)
- The human translator -wizard behind the curtain- was translating fast without paying too much attention. (nice conspiracy theory but I doubt it)

- It is just a printing/display problem (most likely imho)

Also the last word Must have been Laden because the translation in the middle also translated it as Laden

My comment was not to be taken much seriously :-)

Craig Campbell [PersonRank 0]

12 years ago #

Text-to-Speech, coupled with the Google tranlsator is a powerful tool.

Cepstral, a company that specializes in Text-to-Speech, has created a simple form to demonstrate this combination at:

cepstral.com/ttt

Tony Ruscoe [PersonRank 10]

12 years ago #

James Kilfiger said:
"Translation isn’t easy and google aren’t the first to try their hand at it. I’ll wait to see if google can even comprehsibly translate a variety of languages and a variety of registers, and not just formal English ’cause if it aint good at doing slang and that like. . . ."

Very true! From what I've read, Google are training their machine translation system on UN documents – so the content is very likely to be well formed. I doubt UN documents have much slang in them, so when you run slang text through Google's translation system, I doubt it will be much better than current MT systems. Some current machine translation systems handles short, well-written unambiguous texts quite well anyway.

Until I've seen an online working demo where I can enter my own text, I'm going to remain very sceptical.

Matt Landau [PersonRank 0]

12 years ago #

There is a better solution, it is called Blissymbolics. It is not spoken, only written (which is good because it would not contribute to language death). It has so far been translated into 17 languages. It is orders of magnitude easier to learn than a full blown language, yet it is fully expressive. See my website, activebliss.com , for more on the admittedly idealistic solution.

fujifilm9 [PersonRank 1]

12 years ago #

++English Usage+++
well, you asked. Small stuff like
"more thinking along the lines of" vs.
"thinking more along the lines of"

Small stuff, but she's right.

fujifilm9

mrG [PersonRank 0]

12 years ago #

While even a broken or highly domain-specific edition of a statistical-based system would be useful for <em>unimportant</em> information exchange, I expect it may be some time before we can use this in the domain-zoo of the web; statistical language analysis can work very well, but the operative word is 'can' ... where they fail, or when the training goes amok and the system 'learns' some mistake in a totally devoted and ingrained way, the results can be downright humourous. With a blog, that's not so critical, but with a medical text or an engineering manual?

Jon Gales [PersonRank 1]

12 years ago #

I think even more likely than IM is that Google will add this tech to Gmail... Email in your own language always.

Alberto Rondina [PersonRank 0]

12 years ago #

As long as we talk about providing access to websites to people not undersanding the language they are written in, I believe the corpus-based solution is the best viable option, and one I am looking forward to.
Nevertheless, I think it will never be able to supersede human translation, because it is based on it.
And think what would happen if we fed the system a wrong translation: how quickly would it spread and how hard would it be to correct it?

Moses G. [PersonRank 0]

12 years ago #

Interesting project. If anything, this should promote the undertanding various cultures. Today, our view of these cultures is limited through the eyes of the so called scholars. In some cases, what comes through these intelects is twised and off the mark. Having a direct knowledge of the local culture is a good thing.

Multinational organization can also benefit from this experiment; they will simply hook up their browsers (as you suggested, browser pluggin for language translation).

On the other hand, if google becomes the defacto "Translator" and googleplex of languages, can you imagine the fall-out? Disaffected parties harrasing google for not getting it exactly right – lawsuits, boycots ..etc.

Super Machine Translations [PersonRank 0]

12 years ago #


"www.cepstral.com/ttt"

"Would you like some toast?"
>> Spanish :
"¿usted tienen gusto de una cierta tostada?"
(Do you have the taste of some toast?) :-P
>>English:
"you have toasted taste of a certain one?"
>> Spanish:
"¿usted ha tostado gusto de cierto?"
>>English :
"you have toasted taste of certain?"
>> Spanish:
"¿usted ha tostado el gusto de seguro?"
>>English :
"you have toasted the insurance taste?"
>>Spanish:
"¿usted ha tostado el gusto del seguro?"
>>English:
"you have toasted the taste of the insurance?"
>>Spanish:
"¿usted ha tostado el gusto del seguro?"

:-P

Vincent Celier [PersonRank 0]

12 years ago #

Do you really think that Google Babelfish would be a good idea?

Remember that according to the HHG2G:

Meanwhile, the poor Babel fish, by effectively removing all barriers to communication between different races and cultures, has caused more and bloodier wars than anything else in the history of creation.

;-)

Virginia Anderson [PersonRank 1]

12 years ago #

Thoughts from a HUMAN translator:

I) A fact *conveniently* omitted from the beginning of this article: English is not the official language of the United States.

II) While, the author was not trying to make it seem that this Google Translator was another push by the English-only movement, the facts remain that 1) Google is an American company; 2) Google engineers most likely communicate predominantly in English; 3) the U.S. is viewed by itself and the outside world as monolingual English; 4) many key entities that form the basis for the Internet promote communication in English (as the language of business and of science). Therefore, the Google Translator would have a de facto bias toward translating into English.

III) In this age of globalization, shouldn't we be encouraging students and adults to learn other languages and cultures. This learning process teaches you valuable insight into your own language and culture. Creating a monolingual bubble around you is very limiting and creates a false understanding of where you fit into the world. If you can look at your own culture/nation/language from the outside, you will have better success when doing business with those from other cultures – even if we're just talking US – UK relations (nominal language barrier).

Virginia Anderson [PersonRank 1]

12 years ago #

One after-thought on machine translation in general:

As other bloggers here have alluded to, it has taken humankind some 50,000 years to develop language to the point where it is today. And languages are constantly evolving as new technology, technical jargon, and slang are invented. (Gee, I wonder if the Google Translator knows the new word "blogger"?) I sincerely doubt that computers – in their scant 50 years of existence – can "learn" a single language with all its nuance, allusion, hidden meanings, and accompanying gestures. Not to mention learn to translate (written) or interpret (oral) between two different languages.

Why do you think humans hired by the government are involved in translating and interpreting (both meanings) "bin Laden tapes"? Because we can read between the lines and infer from our life experiences what the author/speaker truly means.

I do accept that machine translators are gradually improving and have successful practical applications today: gisting and specific, low-context reports (e.g. weather reports in Canada). But I do not fear that machine translators will put me or my colleagues out of business any time soon. There's so much more to language than apple = pomme = manzana = ...

Philipp Lenssen [PersonRank 10]

12 years ago #

Agreed human translation will be better for the foreseeable future, but sometimes, human translation is simply not an option – like when I read Japanese web sites (or rather, look at the Japanese characters) I don't understand a word nor can I translate it, nor can I pay someone to translate it for me. In other words, a machine translation here would be my only fallback. Whether or not it understands "blogs" depends on how large the corpus of modern sources is. I'm afraid not even a human translator might get VERY new words right.

On a side-note, I just saw German Spiegel translate "podcasting" with "Graswurzelradio"... which is a verbatim translation of "grass roots radio" (which either didn't exist in German language before, or I just wasn't aware of it).

Philipp Lenssen [PersonRank 10]

12 years ago #

Virginia writes...

> A fact *conveniently* omitted from the
> beginning of this article: English is
> not the official language
> of the United States.

I didn't mention the United States in my article. Or what do you mean?

> Therefore, the Google Translator would
> have a de facto bias toward
> translating into English.

There might be bias, but do you have proof? All I know is that the guy at the Factory Tour said they'd be able to translate into languages which they don't even speak. Of course, they must have some proofreaders...

> In this age of globalization, shouldn’t
> we be encouraging students and
> adults to learn other languages
> and cultures.

Agreed! I'm happy to have learned English. It even gives me greater understanding of my native language (because I can compare). Others should do the same – it might add perspective to live, and of course, makes you yourself a human translator for a particular language. Which is still the best there is at the moment, and for the time to come.

William Rice [PersonRank 0]

12 years ago #

"You are now enabled to get a better understanding of cultures outside your own country. Would there be any negative side-effects? Well, one for sure: people would have less incentive than before to learn foreign languages."

I disagree. Every person I knew in college who was learning ancient Greek or Latin, was inspired to read classical works in their original languages because that person first read the works in English. Often times, the first step to wanting to learn a new language is becoming fascinated with the culture. Reading news, blogs, and personal sites from a different culture will make that more prevalent, not less.

williamrice.com

Virginia Anderson [PersonRank 1]

12 years ago #

Clarification:
I just wanted to remind readers that the United States has no official language because all too often they erroneously assume English is our official language. And when discussing an American company developing translation software to cater to monolinguals – well, let's leave it with this old joke:
What do you call someone who speaks several languages? Answer: multilingual
What do you call someone who speaks two languages? Answer: bilingual
What do you call someone who speaks one language? Answer: American.

Then again, perhaps I just jumped to a conclusion that wasn't to be found in the article.

As for the English bias, no, I don't have any proof. Yet, Google is a company that operates in the de facto language of business (English), and particularly an American company (see joke above). Therefore, Google will probably spend much time developing translation software to translate English to/from other languages (FR, DE, ES, JP, etc.), but I have trouble foreseeing them developing translation software to go between various other languages, e.g. FR<>DE or RU<>JP.

Now, I wonder what approach Google might take toward languages of limited diffusion?

Claude Plante [PersonRank 0]

12 years ago #

Your translator does the same kind of error as all tother transators. Examples:

From : English to French

Attempt 1:
Question: I am a fan of Elvis
Transaltion: Je suis un ventilateur d'Elvis

Attempt 2:
Question: I am an Elvis fan
Translation: Je suis un ventilateur d'Elvis

In French, the only possible translation for 'fan' is 'admirable' in English,.
Also, in French the translation of 'vetilateur' could be either 'ventilator' or 'fan' in English.

This is a long long long problem that every translators are facing... since the bigining.

Achraf Chalabi [PersonRank 0]

12 years ago #

The "Existing Translation" example that was mentioned is really misleading as it does not reflect the state-of-the-art existing Arabic>English MT available now, if you try to translate this same sentence on Sakhr MT system (tarjim.sakhr.com), which is an Arabic> English Rule-based MT system, you get the following result "the White House confirms the presence of a new recorded tape to bin Laden".

I would be thankful if you could send me a url where I could try the google translator for Arabic> English, and send you my comments and feedback.

NB: If anybody needs a trial account. Just send me an email at (ac[put at-character here]sakhr.com) and I will send you a free one.

Best,,

Marion Kee [PersonRank 0]

12 years ago #

Philipp, I really appreciate seeing the information from you about the new Google MT technology. The statistical approach to MT really needs to be tried out on a huge testbed of documents, and using the UN corpora sounds like a great way to play with this methodology. It should advance the state of the art, because nobody (to the best of my knowledge) really knows what happens when you scale up the statistical approach so massively.

The continuing operation of Moore's Law means that computational power simply keeps raising the ante for statistical MT applications. You can find out this year what you can do with two-years-old technology (if you're quick enough to apply it that promptly) but (so far) you have to be impossibly fast to find out this year what you can do with this year's hardware. Perhaps Google's effort will start to reduce this time lag. Kind of a scary thought. :-)

The U.N. translates much of its documentation into a wide range of languages. I believe it renders all of it into several "core languages." The information you posted doesn't say how broad a spread of languages was included in the corpora that the new Google system has been trained on. It also doesn't say what percentage of those documents originated in English or in some other language (and if so, which other language), etc. etc. A conclusion that the Google system would be inherently biased toward English could be a reasonable suspicion, but a guess only. There simply isn't enough data in your post to support any conclusion about that. And the MT system's actual behavior is still an unknown, and could be surprising. That's what makes it research, not just a product.

There are many, many political considerations that come into play around translation, and all of them are going to be on display to some extent around any publicly-known MT effort (in my experience.) Your post here has flushed out many of the "usual suspects" in terms of the ideas, critiques and reactions offered in the comments. This kind of reaction's been going on for decades and has nothing to do with Google, or with what you said in your post. It would attend on any such discussion.

Larry Timmins [PersonRank 0]

12 years ago #

"Middle Instruction Translation Language" or MITL took language concepts and applied it to moving either human text or object code into either as the result of the translation. Its core was a concept of stored 'mitl's that could be represented from any source to any target regardless of whether the target was an microprocessor or a human language.

ISMM Proceedings, 1986 L. Timmins for some of the object code translation uses.

In 1984, it was used to track progressive diseases without knowing personal information and the 'mitl' representations were used equally represent the findings of any medical journal article into an environment where matches could occur and then doctors would be alerted to the findings. It tracked over 100,000 medical articles for a particular disease whose outcomes were only represented by 96 bits and therefore quick matchable and scalable in days where giga and computing storage was rarely used outside a data center. The 84 bits used for the non-personal patient information was then attached – like genes – as it evolved its way out of the mitl environment back to the space where the MD was identified and alerted (you have 'n' patients that may benefit prior to disease progression based on information in this article). Unlike the less strict disclosure rules today, the system could only provide the bibliographical reference information to avoid giving doctors copyrighted material from medical journals and drug research. Interestingly, it never failed to be 100 percent accurate.

Go for it Google....

Larry T, ltimmins_[put at-character here]-optonline.net

Ricardo A. Huergo [PersonRank 0]

12 years ago #

Esperanto has been spoken by groups around the world for many years only by enthusiasts. An artificial language in artificial communities has no chances of survival.

Peter Davies [PersonRank 0]

12 years ago #

The worry abourt English bias doesn't matter as the reader will be seeing it in their own language.
If the United Nations has translated identicle texts into several languages the data base should be able to cope with non English translations e.g German to Chinese.
Rather than a language imperialism coming out of this, I think it will be boon to minority languages treatened with being ovewrwhelmed my one of the big ones.
The main problem with a minority language is you can't use it beyond a small group. But if you set your computer to translate everything, you can use and practice the language of your ancesters every day.
All we need is for Google or who ever runs it to provide a means for hundreds of academics or enthusiasts of minor languages to be able to down load a U N document. Then translate it from say English to Welsh or from French or Spanish to Catalan and then up load it back to the data base.
The slang argument is bit of a red herring as even within language groups it can be meaningless ouside of particular sub culture. So being slightly more formal when we want others to understand us is not unreasonable. In fact in some cultures being too informal when you are not personally close is almost offensive, which is the last thing hig we want in interational communications.
The thought of being able to understand web pages and emails from other languages is exciting and lets hope it takes off.
With more modern Browsers, such as Firefox, that have a tabs maybe we could get the original and the translation under seperate tabs. Great for people who are studying a language.
I am curenntly sending emails to people France and cutting and pasting to and from the google translator. It would be even better if this could be doen automatically.
I love the perculiarities how people say things such as the "taste of toast" saga. It actually helps to understand and learn the language itself.
Sorry if this contribution goes on a bit but I discovered your site while using google to translate an email I anm sending and got distracted by a fascinating topic.

John Dillon [PersonRank 0]

12 years ago #

Saying, as Philipp Lenssen does, that "Google will still allow you to translate any document from their search results by the click of a link." presumes falsely that Google allows one to do so now.

Here is a result from a Google search for the phrase "nuda potestas":
Thebais: Liber I – [ Translate this page]
atque aurum violare cibis: sed nuda potestas armavit fratres, pugna est de paupere
regno. Dumque uter angustae squalentia iugera Dirces ...
forumromanum.org/literature/th ... – 36k – Cached – Similar pages

And here is a bit of text from the document indicated by this result:
Haec inter fratres pietas erat, haec mora pugnae
sola nec in regem perduratura secundum.

Google's translation tool currently lacks a pick enabling machine translation of this text or indeed of most texts from the source language in question. As that language is Latin, it is hard to be optimistic that the learning experience from UN documents, however impressive this may be in other ways, will be of much help here.

There are documents available in more languages, living and dead, than those in use at the UN. And there are also documents available in older versions of languages used at the UN that are so different from their contemporary counterparts that these versions, too, will for pragmatic purposes have to be treated as separate languages. Corpus-based machine translation is doubtless a great advance. But more corpora than those obtained from the UN will have to be identified and put to Google's use before it can fulfil the hope of effective machine translation from "any document" (Philipp's words) brought up by its search engine.

suppot@keenage.com [PersonRank 0]

12 years ago #

The sentence of "Bin Laden tape" has now been the classical example of Google's translation. I have never seen a real translation done by Google's MT system. I wonder if it really exists, or is just the emporer's new clothes! I t is reported its Chinese-English MT is "impressive". Google lab, will you please translate the following paragraph of Chinese into English. I have tested various existing MT systems. I 'd like to make a comparison. Thank you.

Jeff Allen [PersonRank 1]

12 years ago #

1. As for what Google and other online portals are doing with regard to MT, I've explained this previously at:

MT online portals
translatorscafe.com/cafe/MegaB ...

MT portals
proz.com/post/265985#265985

list of all online translation systems
translatorscafe.com/cafe/MegaB ...

2. single push-button MT (content gisting inbound/inward translation approach) is not in itself sufficient for producing high-quality translation jobs (outbound/outward translation), but this does not mean that MT software packages are not applicable and appropriate for this type of translation, if the users are well-trained on them. See the following 2 posts which provide links to my several case studies and other proof in this area:

links to previous posts on MT productivity and usefulness
proz.com/post/275373#275373

does MT software take away work from translators?
translatorscafe.com/cafe/MegaB ...

Jeff Allen [PersonRank 1]

12 years ago #

Philipp Lenssen wrote on 05/31/05:
>Agreed human translation will be better for the >foreseeable future, but sometimes, human >translation is simply not an option – like when I >read Japanese web sites (or rather, look at the >Japanese characters) I don’t understand a word >nor can I translate it, nor can I pay someone to >translate it for me. In other words, a machine >translation here would be my only fallback.

Exactly the issue. I've provided examples of this issue of translation that no one wants to pay for, but needs to have, at:

RE: using MT to translating TC posts?
translatorscafe.com/cafe/MegaB ...

Inbound vs Outbound Translation
translatorscafe.com/cafe/artic ...
(scroll down to this entry. This is a 15-page, bullet-list type, powerpoint presentation converted to PDF format)

Jeff Allen [PersonRank 1]

12 years ago #

suppot[put at-character here]keenage wrote on 08/02/05:
>I have never seen a real translation done by
>Google’s MT system. I wonder if it really exists,

As is the case for any multi-national corporation
that has moved into MT technology development
over the past 20 years, either the work is done
just for research purposes at their corporate
R&D center, and/or it is done for internal company
use.
This is why I only use commercially available
MT software in my MT evaluation case studies and
reports.

localudal [PersonRank 1]

12 years ago #

Here: goolocalizations.blogspot.com you may find how good Google is in translationg its own UI, let alone texts of third parties. For God's sake, guys don't even know standard, good English to start with, check for example a copyediting of AdSense/AdWords introductory texts.
Now in MT area, if we talk about pairs of live languages, Babelfish is easily worst among many Russian contenders (like Prompt) for Eng-Rus, Rus-Eng translations. Same applies to French, German, Polish, Ukrainian, Continental Chinese, Japanese which have their own translation sowtwares much superior to what Babelfish can.

Of course, if your (Google's UI, that is) language is punk English from the start, why ever question the problem of babelfishy pseudotranslations into junk foreign languages?

Staci [PersonRank 0]

12 years ago #

Aurum est potestas.. What in the world does that mean?!?!?

David Latapie [PersonRank 0]

11 years ago #

Applying linguistics and etymology (I never learned Latin), I would say

"gold is...." I don't know what potestas id

aurum = gold
est = is

Ionut Alex. Chitu [PersonRank 10]

11 years ago #

potestas=power

Ionut Alex. Chitu [PersonRank 10]

11 years ago #

"aurum potestas est" is the correct form

Pierre S [PersonRank 10]

11 years ago #

The fact that Google Reaseach is going public means (according to me) that the Google translator launch is nearing.That can be a huge mistake,but that would make sense

Nina Widder [PersonRank 0]

11 years ago #

What about languages that are exclusively spoken languages because the people using them are illiterate? I am not referring to languages that are used by little tribes with no interest in communicating with more "civilized" cultures. What I am referring to are for example the indigenous languages in Latin America some of wich are spoken by almost half of the population (e.g. Quechua in Peru). Presently people are trying to offer these languages as subjects at primary schools but you can imagine that the number of documents existing in those languages is close to zero in comparison to the corpus being used for English etc.

Wouldn`t this technique widen the gap and cut these cultures off while others can make even faster progress by gaining more knowledge on any topic of interest? Furthermore, I assume that there are many people being interested in these cultures who would like to learn about them firsthand and not from a foreigner`s subjective perspective who posted his experiences with those people on his website.

Having studied Japanese I am also quiet sceptical about finding ways to translate a language that has so many important as well as linguistically complex levels of politeness into languages like English without losing the essential nuances, e.g. there are 8 verbs for giving and receiving depending on the giver and receiver being socially inferior or superior and belonging to the same or different groups and speaking about this action to an insider or outsider. How could a computer ever possibly evaluate the social relations between people?

Nevertheless I am truly impressed by this project. Having a passion for foreign languages (I studied 6, German is my native language) I believe that there will always be enough people being motivated to keep studying languages no matter how advanced your techniques will become because finally a language reflects many cultural values and concepts and gives you a slight insight into other people`s view of the world – a benefit that no computer based system will ever be able to notice or even appreciate.

(Sorry for my poor English which prevents me from expressing my ideas more accurately and precisely. I hope you could see my point ANYWAY. <- a word which can be translated with five different words in German depending on the context.) :-)

Philipp Lenssen [PersonRank 10]

11 years ago #

> Having studied Japanese I am also quiet sceptical about finding
> ways to translate a language that has so many important as well
> as linguistically complex levels of politeness into languages like
> English without losing the essential nuances, e.g. there are 8 verbs
> for giving and receiving depending on the giver and receiver being
> socially inferior or superior and belonging to the same or different
> groups and speaking about this action to an insider or outsider.
> How could a computer ever possibly evaluate the social relations
> between people?

Interesting stuff. I'm not sure a computer could ever translate this, but how does a human translator translate this then?

Joachim [PersonRank 0]

11 years ago #

Is there a way for a webmaster to support language translation like tagging certain words as non-translatable (i.e. people's names) and identifying the main idea of a sentence or such like? This might get slightly cumbersome – all those structural grammatical charts of Chomsky's etc. look quite over the top -, but maybe there could be some html tags for things like author's name, locations etc.
Does anyone have any ideas on that?

Philipp Lenssen [PersonRank 10]

11 years ago #

I think with Google's approach of pattern matching (if I understand it right) that won't be necessary Joachim. I could well imagine a specific company AI overtaking the semantic web, or manual tagging approaches, simply because most people are too lazy to do this stuff [1] but in order for it to be feasible, you need everyone to participate. As an example, most people don't even use the <address> tag which has been around for many years. However, Google just comes along and releases web apps that can automatically identify an address, and link it to their mapping app...

[1] well.com/~doctorow/metacrap.ht ...

Sharon Queano [PersonRank 0]

11 years ago #

Philip,
Too bad I didn't see this forum much earlier. Please tell me one thing: what is the translation software that Google uses to translate its web pages? I'm very curious. I know machine translation is far from perfect and therefore needs human intervention, but there are creative ways to deal with this.
Is it, by any chance, Systran?

Tony Ruscoe [PersonRank 10]

11 years ago #

Sharon: AFAIK, yes, Google is using Systran for its non-BETA language pairs. The BETA languages are using Google's statistical machine translation system:
googleresearch.blogspot.com/20 ...

Tim Noir [PersonRank 0]

11 years ago #

This is only attractive to Americans who are too arrogant to learn to speak a foreign language.

Yes, a translator helps to understand basic communication but you can never grasp the beauty and complexity of another language without being able to speak it yourself.

Why do you think we listen to Mozart's opera DIE ZAUBERFLOETE in German instead of THE MAGIC FLUTE in English?

Because there is no human capable of translating the rhyme and rhythm of the lyrics in German, and preserve the music at the same time.

Tim

NateDawg [PersonRank 10]

11 years ago #

[put at-character here]Tim, does it have to be German that I'm required to learn; why not French or Latin?

<rant>
Listen, learning a foreign language definitely will give you a better appreciation for that language. For American's, learning classical languages such as German, French, Latin and Greek, ect... will also help enhance there own langauge skills as American english is a melting pot of foreign languages. That said, not everyone has the time or the proper environment to correctly learn a said foreign language. I can stop and learn German from a tape or a book, but I'll never fully appreciate it's complexity unless I immerse myself in it.

I do take offense for calling me arrogant. I use English for 99% of my interactions and 1% throwing in a few of the spanish phrases I know. American High Schools (at least in my state) do require at least two years of a foreign language, and if after that you wish to pursue more, you are free to do so (I choose to take extra progamming classes instead:) And who's to say what foreign langauge I must learn?

</rant>

This thread is locked as it's old... but you can create a new thread in the forum. 

Forum home

Advertisement

 
Blog  |  Forum     more >> Archive | Feed | Google's blogs | About
Advertisement

 

This site unofficially covers Google™ and more with some rights reserved. Join our forum!