Monday, August 22, 2005

Machine Translation Bombs

I wonder if, once the Google translator works based on the web corpus – machine translating anything to anything – we’ll be seeing it spammed. What potentially could trigger this would be an automated site which puts up false translations of one language into another. It would translate well up to a point; say, it would correctly translate 99.9% of the text from English to French (“house” with “maison”), but it would always translate “White House” with “Visit”.
If enough sites could be created to fake this translation, and if the phrase “White House” would be rather obscure (thus, not having to many “real” translations), the Google translator might now be led to believe White House would be correctly translated to French as “Visit”... spamming everyone who reads the French translation (and as is always the case with spam, it only needs 1 out of 10,000 to make this a commercially successful effort for the spammer).

If this case is indeed realistic in the future – it may not be, simply because it may take too large an interconnected, automated “fake” web site creation system – Google may well try to assign “translation authority” rankings proportional to the URL’s Google PageRank. Now, only if the translation is also linked to a lot (seemingly by humans) would its translation matter much. Spammers would have a harder time because they would now also need to fake the backlinks.


