Google Blogoscoped

Tuesday, July 1, 2008

Better Flash Indexing for Google?

Google announced they improved their handing of Flash files in search results; they say they now have a better ability “to index textual content in SWF files of all kinds,” including buttons, gadgets and so on. Google were already able to find and read Flash files to some extent, but now, to quote them (my emphasis):

We’ve developed an algorithm that explores Flash files in the same way that a person would, by clicking buttons, entering input, and so on. Our algorithm remembers all of the text that it encounters along the way, and that content is then available to be indexed. We can’t tell you all of the proprietary details, but we can tell you that the algorithm’s effectiveness was improved by utilizing Adobe’s new Searchable SWF library.

Additionally to finding text, Google says they’re also finding URLs to feed them to their “crawling pipeline.” Apparently, all with a little help from Flash-maker Adobe.

Limitations

Google write they may not be able to execute some types of JavaScript, though, so “if your web page loads a Flash file via JavaScript, Google may not be aware of that Flash file”. Also, external resources loaded into the Flash – say, an XML content file for a cross-language Flash application – while perhaps being separately indexed, will not be considered part of the main Flash content. Google additionally suggests there are some problems at the moment with bidirectional languages used in Flash.

Note that Google’s stated JavaScript limitation may not apply to all types of inclusions. For instance, a search for site:sketchswap.com filetype:swf (replace the domain with your own to check) reveals that Google indexed the Flash file my friend and I included via the popular, JavaScript-based SWFObject library (not to say that this was the route Google took to discover the SWF).

[Thanks Beussery and Miss Universe!]

Advertisement

 
Blog  |  Forum     more >> Archive | Feed | Google's blogs | About
Advertisement

 

This site unofficially covers Google™ and more with some rights reserved. Join our forum!