Google Blogoscoped

Thursday, May 24, 2007

Google Releases Cross-Language Search

As promised at the recent Searchology event (and mentioned here earlier), Google’s behind-the-scenes translation search engine is live now. For example, you can search for how are you doing, and set your “search" language as English, and the preferred “find" language as German. Now, Google will go ahead and search for the German translation of “how are you doing,” and present you with German-origin search results translated back to English (displayed side-by-side along their German counterparts, which you can optionally hide).

OK, that’s the theory – in practice, Google for the query “how are you doing,” a random first sample I tried, returned a page apparently auto-translated to German with Google Translator in the first place and stored as indexable copy online. Because Google’s translation system incorrectly translated the English phrase into “wie du tuend bist” (instead of the correct “wie geht es dir”) in both cases, this was a match – even though the result page is largely grammar nonsense instead of real German!

(The second query I tried, “chairman mao,” was facing yet another crucial obstacle – the top result page’s content was showing a large image of text containing Chinese characters, something which Google didn’t translate because their system only tackles text. Tackling images would need OCR and dynamically created images... not impossible, but harder to implement.)

It’s no real surprise that this feature, just as Google’s other translation-based services, is only as good as the underlying translation quality... which is still pretty bad for many real-life cases, and years away from a Google Translator-based “universal language” (note Google currently relies on in-house machine translation for English to Arabic/ Chinese/ Russian, and Systran technology for other language pairs). However, as the translation quality improves, so will the usefulness of this new search translation feature... and who knows, it may one day be good enough to be rolled out into certain “universal” web searches. Already, the quality may be stable enough for certain types of research.

On a side-note, as always when Google is basically acting as content proxy – when you click on a search result, you are referred to a Google-hosted dynamic translation of the page – this new service raises some interesting copyright issues; after all, Google is fully republishing a translated version of the origin page, with the searcher skipping the owner host and staying on throughout research. It makes some sense, too, because the user might not have seen the origin page anyway as it may be in a foreign language, but it’s a clash with traditional copyright (and possibly another sign we need to get rid of “traditional copyright” sooner or later).

[Thanks Katie W.!]


Blog  |  Forum     more >> Archive | Feed | Google's blogs | About


This site unofficially covers Google™ and more with some rights reserved. Join our forum!