Google Blogoscoped

Forum

Google spidering external CSS files now (?)

alek [PersonRank 10]

Wednesday, June 28, 2006
18 years ago2,505 views

I've heard people say that Googlebot is spidering external CSS files for a while, but have always said "check your logs – does a *real* googlebot grab any external .css files?" ... and at least in my case, the answer was no.

But a recent forum post motivated me to do a quick grep ... and I stand corrected. I looked at some logs on a website with light traffic going back to March 30th. Googlebot's came by the site a total of 824 times and (SURPRISE!) there was ONE googlebot visit that spidered a .css file on June 22nd ... so this may be a fairly recent thing (?)

The IP address was 66.249.65.10 (which appears to be a legit Google IP address). BTW, I did not see any visits from slurp or msnbot to grab the external .css file ... so maybe they aren't doing this ... or at least not yet.

I don't use much css (yes, shame on me) ... but would be curious if anyone else has seen this behavior. Makes sense as css can drastically alter the look of a page, so the search engines should try to use all available information to see what it actually looks like.

Black hats probably won't like this ... and my guess is they will probably use robots.txt to block the spiders. Note that could raise a red flag ... so an interesting question is would the search engines decide to send a stealth spider that ignores robots.txt and grabs the .css file to see what is really going on?

TOMHTML [PersonRank 10]

18 years ago #

Someone discovered the same, few days ago
http://www.webrankinfo.com/actualites/200606-google-et-css.htm (FR)

the site supposes that Google does that to fight spammers with "visibility:hidden" issue...

Philipp Lenssen [PersonRank 10]

18 years ago #

I recently discussed this with someone. I think it's *incredibly* hard to parse CSS for "blackhat" hiding unless you use an existing browser or write a rendering engine to create a picture of the website. Even when you do that there's the problem of JavaScript. Just think of...
- All the browser hacks complicating CSS
- A link of a certain color positioned over a background image of different colors – depending on its position the text can be hidden or not
- CSS overrule rules, with differences between an ID's weight, or the number of wrappings/ inherits
- Perfectly valid DHTML approaches to show/ hide stuff with CSS, like hiding the navigation first then showing it on demand

alek [PersonRank 10]

18 years ago #

An interesting Google Search suggests (!) that maybe (!) CSS files are also showing up in the index:
   http://www.google.com/search?q=filetype%3Acss+style

Note disclaimers above – it's possible that these .css files are H**REF'ed and would therefore be treated as "regular" files that should be indexed. I was actually surprised the number (currently 596) was so low.

I don't have data from a week ago ... but going forward, if that number increases quite a bit, it means that Google is also indexing the css files.

P.S. Ditto what Philipp said above – would be a challenge to "use" this in a meaningful way ... but interesting to try to observe what's going on.

Forum home

Advertisement

 
Blog  |  Forum     more >> Archive | Feed | Google's blogs | About
Advertisement

 

This site unofficially covers Google™ and more with some rights reserved. Join our forum!