Why does Google think the Google Labs page – i.e. http://labs.google.com/ – is written in Japanese?
Need me to explain what I mean? Here goes...
Check this search out in Google:
http://www.google.com/search?q=labs.google.com
The [ Translate this page] link next to the result links to Google's machine translation of the page, which says in the header:
<< This page has been automatically translated from Japanese. BETA >>
The cached version obviously isn't Japanese either:
http://www.google.com/search?q=cache:labs.google.com
How come their language detection is so wrong for this page? |
Easy. It contains this:
Googleサジェスト日本語版 (Google Suggest in Japanese) 検索窓に入力中に、検索用語の候補が表示され、矢印キーで選択することができます。 3/8/05 – ご意見 – ディスカッション
|
Good point – but that's just one line. So why would they go with a minority?
Or are they clever enough to be able to identify when a small section of a page is in another language in order to translate just that section? |
Yeah, I noticed that. But I wonder if that works with other languages too.
(It's pretty easy to ignore Latin characters when translating from Japanese into English because you're looking for a different character set.) |
Hmm. Quite clever. (It has translated some of the English words that also happen to be French though, even though their surrounding words haven't been translated.)
However, Google doesn't recognise that page as being French, despite it containing more French that the Labs page does Japanese:
http://www.google.com/search?q=http%3A%2F%2Fforum.wordreference.com%2Fshowthread.php%3Ft%3D56002
I don't think this behaviour is intentional. I think it's just lucky... |
Another strange behavior: Google marked some of my pages as Chinese just because they were linked from a Chinese site. And I don't speak Chinese.
http://www.google.com/search?as_q=&num=10&hl=en&client=firefox&rls=org.mozilla%3Aen-US%3Aunofficial&btnG=Google+Search&as_epq=&as_oq=&as_eq=&lr=lang_zh-CN&as_ft=i&as_filetype=&as_qdr=all&as_nlo=&as_nhi=&as_occt=any&as_dt=i&as_sitesearch=googlesystem.blogspot.com&as_rights=&safe=images |