Google Blogoscoped

Forum

Harvesting Concealed e-mail Addresses

George R [PersonRank 10]

Wednesday, June 24, 2009
15 years ago2,709 views

Baxil is reporting that Google is decoding concealed e-mail addresses that require JavaScript to reveal. They are making the decoded address available in SERP snippets. Apparently some spammers are using those addresses.

http://baxil.livejournal.com/266909.html

James Xuan [PersonRank 10]

15 years ago #

Crazy. What reason is there for this?

Ionut Alex. Chitu [PersonRank 10]

15 years ago #

It's not that surprising. After all, Google's crawler interprets some JavaScript code to find web pages or to enable some parts of a page that aren't visible on page load.

Some details:

<< Google has also been crawling some JavaScript for a while. Primarily, they’ve been extracting very simply coded links. As of today, they’re able execute JavaScript onClick events. They still recommend using progressive enhancement techniques, however, rather than to rely on Googlebot’s ability to extract from the JavaScript (not just for search engine purposes, but for accessibility reasons as well).

Googlebot is now able to construct much of the page and can access the onClick event contained in most tags. For now, if the onClick event calls a function that then constructs the URL, Googlebot can only interpret it if the function is part of the page (rather than in an external script).

Some examples of code that Googlebot can now execute include:

   *

   <div onclick="document.location.href='http://foo.com/'">

   *

   <tr onclick="myfunction('index.html')"><a href="#"
   onclick="myfunction()">new page</a>

   *

   <a href="javascript:void(0)" onclick="window.open
   ('welcome.html')">open new window</a>

   >>

http://searchengineland.com/google-io-new-advances-in-the-searchability-of-javascript-and-flash-but-is-it-enough-19881

Ionut Alex. Chitu [PersonRank 10]

15 years ago #

John Mueller sez:

<< We're constantly working on ways to improve finding and indexing the information out there, which includes JavaScript and Flash content. At any rate, as far as I know, email scraping bots have been able to parse this kind of JavaScript for a while now (not to mention all the other sources like infected PCs). >>

http://friendfeed.com/jezc/d446bba3/thinking-about-google-email-spam-and

Ianf [PersonRank 10]

15 years ago #

I hasten to report that my own sole obfuscated email address on the web, using previous instance of http://hivelogic.com/enkoder/form [=www-urlencoded address-string array in reverse order with a simple routine to assemble it back] has so far been untouched/ not cracked (page has been indexed by Google). The current Enkoder uses far more convoluted method, see for yourselves [observe: no line breaks in "\" strings, total length 1208 bytes]:

[Code removed as requested – Tony]

Forum home

Advertisement

 
Blog  |  Forum     more >> Archive | Feed | Google's blogs | About
Advertisement

 

This site unofficially covers Google™ and more with some rights reserved. Join our forum!