Google Blogoscoped

Saturday, April 11, 2009

DiggBar Pages and Google

Digg recently released a feature that frames all pages linked from Digg. Instead of linking straight to the URL of a story, they’re linking to something like http://digg.com/d1npNz, and then embed an Iframe which contains the original page. This move is already much hated by many, with some people including framebuster scripts or even special greeting messages for Digg. On the other hand one could also argue: let Digg do whatever helps their community with the discussion. The Digg creators find the DiggBar continues their “symbiotic relationship with content publishers” and that it “will only enhance publisher traffic as more people discover and share content on Digg”.

Now, how does Digg allow the indexing of such pages? After all, it’s hard to see an added value for someone stumbling upon DiggBar pages in Google. Digg says they “always represent the source URL as the preferred version of the URL to search engines”, using JavaScript to make clicks go to the framed page... but this ignores the fact that the URLs will be what users end up with in the address bar (the URLs’ shortness make them look very linkable, too). Searching Google for digg.com/d1npNz, I can see the page indexed in Google; the cache also does not reveal anything that would forbid Google to index it. However... when checking the current source of digg.com/d1npNz, there’s the following two lines included painting a brighter picture, and Digg’s blog post on the subject confirms these changes:

<meta name="robots" content="noindex"/>
<link rel="canonical" href="http://www.techcrunch.com/2009/04/[snipped]"/>

What Digg is trying to tell Google now is: 1) don’t index this page (via the noindex directive); 2) this page is actually a duplicate of another page, which should be considered the source (via the canonical bit). The only caveat: Digg’s directives likely won’t work at all. For one thing, this is technically not a duplicate page, as the HTML used is completely different. Google say they “allow slight differences, e.g., in the sort order of a table of products” but they don’t indicate that a completely different source would work (Yahoo, who also support the canonical tag, make it explicit by saying “if the content on the source and target was substantially distinct and unique, the canonical link may be considered erroneous and deferred”). Furthermore, per Google’s definition you can’t link to another domain in the first place: “Google currently will take canonicalization suggestions into account across subdomains (or within a domain), but not across domains. So site owners can suggest www.example.com vs. example.com vs. help.example.com, but not example.com vs. example-widgets.com”.

How many DiggBar pages have been indexed by Google so far? It’s hard to tell the exact number, but the following search, which looks for a text string available in DiggBar pages, ...

“close close options” site:digg.com

... returns a result count of around 2,370 pages right now. Keep in mind these were likely all indexed before there was a “noindex” in the pages, which would mean that no further pages get indexed from now on.

Site bars or tool bars like the one offered by Digg, in various forms, have been popular for a long time and, it seems, are getting more popular everyday. Many sites try to bind users to them in a strategic game of web Go. Google got something named the Friend Connect bar; Facebook introduced one for their own site; StumbleUpon has one; Google Images always links out to pages with such a top bar. Some toolbars are installed in the browser, and sometimes without the user realizing how and why: install Real Player, and the Google toolbar will be added by default; install Java, and you’ll find yourself ending up with the Yahoo bar.

One problematic part with kidnapping URLs like Digg does is that the controlling DiggBar page may change any time in the future, e.g. it could stop linking to the original page, or take up more space, or pull up interstitial ads... bad for all those who bookmarked the toolbar URL or linked to it (for similar reasons it’s generally not good to use URL shorterners to link anywhere). As a webmaster you have around two choices: a) somehow block or bust these schemes, risking that you won’t be upvoted on certain social sites, or b) simply ignore all that stuff, enjoying any and all traffic you get, or just enjoy being a useful part of the web-wide discussion. And here, it’s worth noting that even the sites framed may not always be the actual source for a story, but may just excerpt a quote and then link elsewhere, or embed a video and so on. Personally, on my sites I want to allow hotlinking of images – assuming the use innocent until proven guilty – and also try not to use framebuster scripts. In the case of the DiggBar though, I get the feeling many out there feel like Digg officially went over to the dark side of the force: they can now make an AT-AT trip with the raise of their hand, but everybody’s hoping they’ll be defeated in part III.

Advertisement

 
Blog  |  Forum     more >> Archive | Feed | Google's blogs | About
Advertisement

 

This site unofficially covers Google™ and more with some rights reserved. Join our forum!