Google Blogoscoped

Forum

Wikipedia Nofollows Links  (View post)

Mathias Schindler [PersonRank 10]

Monday, January 22, 2007
7 years ago8,897 views

The way I currently read this (from an "inside" perspective) is "en.wikipedia.org stops playing a special role when in comes to handle external links". Having said that, I totally agree that the current modus operandi is not something one should consider optimal.

In theory, all the external links from Wikipedia are the best the web has to offer on a given subject.

In practice, the nofollow-tag does not matter at all. Wikipedia content gets mirrored all over the internet, mostly without the nofollow tag. There is enough google juice flowing around to those who are linked from Wikipedia.

Niraj Sanghvi [PersonRank 10]

7 years ago #

Mathias, but won't content that gets mirrored now get the nofollow tag included? Granted, sites may not be updating their mirrors. But mirror sites will also mirror the nofollow tag as they just duplicate the content as it appears on Wikipedia, so it's not like there's alternate sources providing links that don't have a no follow.

Philipp, I definitely agree that this is not a great solution and that the "fading nofollow" is quite effective at combatting spam without destroying net etiquette and a "link and be linked" sort of model.

Domisto [PersonRank 1]

7 years ago #

Well said Philip. The self-correcting argument is a very strong one and, in my mind, a better alternative to the nofollow.

Matthew Claypotch [PersonRank 1]

7 years ago #

As someone who frequently uses Wikipedia as both a "what the heck does that mean?" research tool, and a starting point for random surfing, I fully appreciate the External Links section's ability to collect the most relevant and useful resources on a subject. Products are moving in place that are exploiting the "credibility" of external links. The one I know of off the top of my head is Wikiseek (newly launched). How these services will be effected is unclear.

While I appreciate the netiquette of link sharing, I think that principle should stop at Wikipedia. If links stand to gain in search engines from mention in Wikipedia articles, the incentive changes from being an excellent objective resource, to search engine position. Even with the "healing" of bias and falsehood, as long as a site stands to gain in something beyond reputation, then it hurts the democratic nature of Wikipedia. I stand by their decision to nofollow their links. It may be bad manners as a website, but as an objective resource, it is crucial.

Mathias Schindler [PersonRank 10]

7 years ago #

[put at-character here]Niraj:
no. Not necessarily. Most mirrors take the Wikipedia database dumps. A link in this dump looks like "[http : //www.foo.tld description]" (I hope this survices this comment form. Depending on how these people make HTML again from the dump, they might be using the MediaWiki parser that has a switch to make the nofollow tag appear or not.

Those mirrors, who simply "cut and paste" ("live mirrors") the content will now include the nofollow tag. But this behaviour is strongly discouraged.

Mathias

Ryan [PersonRank 0]

7 years ago #

what Wikipedia needs to do is periodically examine and remove some of it's editors. Then it'll fix many more problems than nofollowing links.

the NoFollow doesn't bother me that much though – as long as they keep my links in. I'm gettnig several hundred visitors from wikipedia daily and I'd much rather have those.

The problem is, many admins are so gung ho to think that everything is spam, that if you only add a link to an article... they remove it thinking you're spamming.

Many times I've added useful links to articles (not my own sites... but sites I thought were helpful) only to see an admin call me a spammer.

Mathias Schindler [PersonRank 10]

7 years ago #

[put at-character here]Ryan

removing some of its editors? This is called "Arbitration Committee" in the en.wp.

The result can be seen here: en.wikipedia.org/w/index.php?t ...

JohnMu [PersonRank 10]

7 years ago #

I noticed the nofollow does not seem to be added to all links – I still see plenty of pages with external links that are clean. I'd post an example but it would probably be abused right away ... and nofollowed soon after. :-(

Are they cleaning them up step by step, are older pages "immune" or can pages be marked as "trustable"? or does it only effect certain types of pages?

JohnMu [PersonRank 10]

7 years ago #

Ok – figured it out. As soon as you edit a page it goes nofollow. :-(
... or maybe it's a caching layer they have on top of the stack?

Mathias Schindler [PersonRank 10]

7 years ago #

[put at-character here]JohnMu

yes, we are using caching. You can manually trigger purging the cache and the nofollow tag might appear.

Editing a page might also trigger the purge of the caches.

Richard Reglin [PersonRank 0]

7 years ago #

Couple thoughts:

1. The "nofollow" tag is inherently flawed because it invites a webpage owner to tell the search engine how to do its job. A search engine vendor should never promise to obey the instructions of a webpage owner regarding how its content is interpreted. It would have been better to have a tag like "usergenerated" or something like that that says, "hey search engine, this link may have been put here by some random unaccountable user. Do with that information what you will."

2. Regarding the claim that Wikipedia is self healing, it's very important to remember that that healing is not free. Wikipedia is edited entirely by volunteers. As an editor myself I can tell you that most editors spend a very large proportion of their time battling various kinds of abuse, including linkspam. This is boring and demoralizing and takes away from time that could be used for constructive editing. You should consider that when volunteers go around removing linkspam, they are in a certain sense providing unpaid free labor to the search engine vendors. The existence of search engines creates the incentive for people to add linkspam and the necessity for volunteers to go around fixing it up. The wikipedia authorities want wikipedia to be the best possible encyclopedia; making it a good source of data (and thus revenue) for search engine vendors is not and should not be a top concern.

Ramen Junkie [PersonRank 1]

7 years ago #

Bleh, the whole idea of "I linked to you so you have to link back is insanely stupid. I may or may not be more polite but if the opposing site linking ot Wikipedia doesn't merit a link back it shouldn't be there.

Tony Ruscoe [PersonRank 10]

7 years ago #

Here's a very good point from Amit Agarwal:

<< Let's illustrate with an example:

Say you discover a cool feature in the iPod (called Stylus) and blog about it. Tomorrow, the Wikipedia contributors append the details of iPod Stylus (your discovery) to the Wikipedia page on iPod. They do attribute your blog but search engines will never see that attribution (or read your blog via Wikipedia) because of the rel=nofollow tag.

...

Result, your site appears after Wikipedia in the "iPod Stylus" search results and you get less or no traffic while Wikipedia gets to enjoy all the fruits of your labor.. >>

labnol.blogspot.com/2007/01/wi ...

SirNuke [PersonRank 1]

7 years ago #

Links can be extremely subjective, since unlike facts there is no good way to absolutely determine whether a link is good or bad. The automatic bias towards editor's personal websites can make link selection rather controversial.

No one should freak out about losing PageRank over the nofollow'd of Wikipedia links. Most Wikipedia pages have a fairly low page rank, and most mirrors have no PageRank at all. My personal thought is that Wikipedia links are more useful with the traffic they bring.

[put at-character here]Tony Ruscoe
That's a rather poor example. Wikipedia is an encyclopedia, not a news source. In that case, submit the discovery to more traditional Internet news sources (Digg, Slashdot, etc), which excel at driving traffic and links.

Tony Ruscoe [PersonRank 10]

7 years ago #

[put at-character here]SirNuke

Of course, you're right; Wikipedia isn't a news source – but it does include up-to-date information about things like iPods, citing where that information may have been "lifted" from. You just have to look at the list of 87 references on the entry for iPod to see this: en.wikipedia.org/wiki/IPod

I don't see how that makes it a poor example.

Jason Schramm [PersonRank 5]

7 years ago #

I think the problem of nofollow is big and bad. Wordpress blogs all have nofollow in comment links without even knowing it. This is bad when you filter out spam comments and those links are legitimate and can add value. I wrote about it on my site. jasonblogs.com

Francis [PersonRank 0]

7 years ago #

What about a voting system?

Something in the lines of what you find on digg.com/ where users have a input on what is being posted.

Below a certain threshold the content would get a no follow attribute.

SirNuke [PersonRank 1]

7 years ago #

[put at-character here]Tony Ruscoe
Assuming the discovery is important enough that it will get a link in Wikipedia, and that the discovery warrants a new Wikipedia entry (it's rather unreasonable for a blog entry about an iPod Stylus to be ranked higher than Wikipedia's entry on the iPod). As such, it's also a safe assumption that the discovery is important enough to make the front page of Digg (or where ever). Being at the front of any big Internet news site is much more valuable than Wikipedia, I believe being on the frontpage of Digg brings an average of 500 new links.

At which point, whether the blog is before or after the Wikipedia article probably has little to do with whether Wikipedia's link has nofollow or not.

In short, if the iPod Stylus' blog entry isn't beating Wikipedia's entry (which has a nofollow link to the blog), the culprit probably isn't the nofollow in Wikipedia's link.

(In retrospect, I do apologize, I was in a hurry when I was writing my previous post and I didn't make this clear).

Rong Ou [PersonRank 1]

7 years ago #

Just curious, do spammers actually sit at their computers and edit wikipedia all day? If it's done through some sort of bot, wouldn't a captcha system be more effective at combating link spam than this "all links are nofollow" approach?

Mathias Schindler [PersonRank 10]

7 years ago #

Just a few numbers. The english language Wikipedia has currently about 3 million external links with a growth rate of about 0.2 million each month.

Any proposal how to handle the nofollow tag should be able to fulfill two (overlapping) criterion:

1. it should scale
2. it should not draw any resources from our main priority: writing an encyclopedia

James [PersonRank 0]

7 years ago #

Mathias: You might find that nofollow by itself will draw resources from your main priority, as people who once contributed on the basis that their work helped the wider web as well as wikipedia itself, stop contributing.

Put simpler, a lot of people once hoped that wikipedia would be the sum of all human knowledge. Now that they've had a few years to see how wikipedia works (and fails), and that wikipedia is only interested in summarising other sources, they'll go back to writing webpages and blog entries instead of wasting effort in edit wars, and let the information be added to wikipedia by someone else, if it ever does.

Carsten Cumbrowski [PersonRank 1]

7 years ago #

You don't get it, it's not about the ARTICLE MAINSPACE. And I am sure that the NOFOLLOW will be ignored by some Search Engines like Google in certain instances. Call it a white-list or links that could be determined as relevant. Wait a moment, that is what the Search Engines are supposed to do in the first place.

The NOFOLLOW brought one pain after another and to be honest, the Spam Issue is less painful than this NOFOLLOW debacle.

Comments at Blogs have often nofollow on so engaging in a discussion with another blogger is suddenly a one way street. The commenter enriches the content of somebody else's blog and gets in return nothing (from a Search Engine Spider's point of view, not human).

Marketers that utilize Affiliate Links to monetize their sites (and not AdSense) are encouraged to use NOFOLLOW for their links that Search Engines will not see the Affiliate Links as a Spam attempt or dink the site "quality score" down because every affiliate puts affiliate links only up to make money, because all affiliates are thugs and greedy bastards, what they are not (I can write you a book about this crap).

People signal via NOFOLOW links that they don't trust the other site which makes the other Webmaster happy and everybody with more than one brain cell think why people add such a link in the first place.

If all that will go away and I have to delete more stuff in my junk folders, so be it. I don't care if I have to delete 100 or 150 trackback spams or 500 or 700 Junk emails.

  

Pramit [PersonRank 1]

7 years ago #

Maybe Wikipedia has become too ambitious for its own sake. I have written about what happens next on my MediaVidea blog.
mediavidea.blogspot.com/2007/0 ...

Hoch auf einem Baum [PersonRank 0]

7 years ago #

Most wikipedians don't see themselves in the business of selecting the best links for each of millions of encyclopedia topics and improving search engine results, but rather aim at writing good encyclopedia articles about them.

It probably hasn't much to do with Jimbo Wales' change of mind regarding nofollow on Wikipedia, but people should be reminded that he and his company Wikia just announced a project which is exactly in the business of collectively selecting the best links for topics, to build a user-run search engine. If successful, this should give you what you (wrongly) expect from Wikipedia.

Regardless of what one considers to be the bad effects of en.wikipedia reintroducing nofollow, the people which are getting hurt most by it
are link spammers (black hat SEO people), as can be seen from these comments:

"There goes hours upon hours of editing and link building down the drain."
seorefugee.com/forums/showthre ...

"i knew it would happen eventually... but... FUCK!!!!"
"this ruined my day :("
wickedfire.com/industry-news/7 ...

Yesterday I wrote a story about this in the "Wikipedia-Kurier", the internal news magazine of the German Wikipedia: de.wikipedia.org/wiki/Wikipedi ...

Also, on the english Wikipedia a new edition of the "Wikipedia Signpost" has just appeared which covers this story from an internal perspective and cites Philipp's blog post:
en.wikipedia.org/wiki/Wikipedi ...

t xensen [PersonRank 4]

7 years ago #

I've posted about this on my blog at rightreading. I'm with Philipp, I think it's a question of community. You have to give back if you're going to take.

John Honeck [PersonRank 10]

7 years ago #

If NOFOLLOW is an implication that you don't trust the site the link is going to but want to give the reader the option to view the site, what does it say about the wiki as an information source if they don't trust any of the sites they derive their information from?

Essentially they are building a site that is an island in the internet with only roads to it and no way off. It's un-natural and should be seen as a sign of a low quality site. They show 12 million pages indexed in google. There should be some sort of spam filter tripped that says that any site that has over 12 million pages and not any external links that can be trusted has to be reconsidered. Evaluating links between sites is what Larry and Sergy built their billions on and a site that only accepts links to it and does not offer any out should be considered contrary to their original intents.

I've never wikied and have no intention on doing it as any time I've stumbled on it through search it seems pretty low quality information, but if they cannot find a way to monitor the authors, editors, or each and every link to see if should be a trusted source then they are no better than the self-generated spam sites out there.

The original intent of NOFOLLOW to stop blog spammers and forum spammers was noble enough, but I consider this an abuse of it and misdirected application. If a page is good enough to quote then it deserves the link so that the user can view the original and the search engine can also evaluate the original. A site or even a page that contains mostly NOFOLLOW links should be evaluated as low quality and be ranked as such.

The whole issue is just counter intuitive and potentially opens up a can of worms that I bet they didn't intend on it doing. Shouldn't blogger ad nofollow to everything on its millions of subdomains (not just the comments) they don't have control over what is linked to?

I always have the firefox extension on that highlights NOFOLLOW links and I know that when I stumble upon a site that has all pink links, I consider it junk that doesn't trust it's own material and move on, perhaps the search engines will do the same soon!

Philipp Lenssen [PersonRank 10]

7 years ago #

> If NOFOLLOW is an implication that you don't trust the site
> the link is going to but want to give the reader the option to
> view the site, what does it say about the wiki as an
> information source if they don't trust any of the sites they
> derive their information from?

Well put – this wraps up the core issue for me!

> Essentially they are building a site that is an island
> in the internet with only roads to it and no way off.
> It's un-natural and should be seen as a sign of a
> low quality site. They show 12 million pages indexed
> in google. There should be some sort of spam filter
> tripped that says that any site that has over 12 million
> pages and not any external links that can be trusted has
> to be reconsidered.

Exactly Johnweb, I was starting to ponder the same. In the eyes of Google link analyzers trying to find spam, Wikipedia may now well look "perverted"... an abnormal, thus artificial (thus spammy) link structure. Maybe it will now resembles some kind of blog comment spammer who leaves a huge amount of backlinks but never links back to any the websites.

Then again, PageRank 7 German Wikipedia already had this abnormal nofollow structure without punishment, and besides, it may well be that Google would rather manually change their algorithms than to automatically punish Wikipedia, as that would cause a "good" seed site to become a "bad" seed site... pushing millions of good websites into a bad neighborhood 'cause they link to Wikipedia. We need to remember Google rankings are *not* fully automated – they run on human-created & human-monitored automatisms.

Reto Meier [PersonRank 10]

7 years ago #

It's interesting to note that Matt Cutts thinks "...it’s the right call".

mattcutts.com/blog/what-did-i- ...

Seems he's taking the view that discouraging the spammers in the short term with the expectation of allowing 'clean' outbound links in the future is worth the short term problems. Not sure I agree with him – I'm with you Philipp.

Justin Mason [PersonRank 1]

7 years ago #

For those worrying about the effects on Google's index – there's no reason Google has to *respect* "nofollow" on Wikipedia pages. "Nofollow" is just a hint for indexers, not a law in itself.

Stan Shebs [PersonRank 0]

7 years ago #

Oh boo hoo. The links are still in the articles, so interested people can follow them just like always. I didn't put thousands of hours of my own time into Wikipedia so that some lazy bum could advertise Cabo rentals.

In practice, most of the good external links are to institutional sites that don't care about their page rank anyway.

JavOs [PersonRank 0]

7 years ago #

It seems easy to criticize from outside Wikipedia, but as someone commented already, it's discouraging to see thousands of man hours wasted in spam removal labours. There's a similar problem with vandalism, but at least this spam problem has an expedite solution, and it has been, in my opinion, solved for good, once and for all. This sends the clear message that Wikipedia is an encyclopedia, not an internet portal or directory like some people seem to think. You wouldn't complain if Brittannica removed all "spammy" info. In fact, the printed version I have does not even recognize authorship, except for two or three initials from the contributor at the end of an article (and a list of contributors in a sepparate volume). To start with, no external link should ever have gotten any boost from Wikipedia, since external links are for informational purposes, not to give popularity to sources' or authors' websites, even if these websites have contributed substantially with their info.

Ken Y-N [PersonRank 0]

7 years ago #

I'm not shedding any tears over the loss of SEO, nor are many of my top search keywords in competition with Wikipedia (in fact I might even gain from the nofollow change in a couple of places if a couple of higher-placed but less well-linked other than from Wikipedia pages drop out), but I think there has to be a better way. I like the suggestion in the main article – after a few days or so if a link is not edited it loses its nofollow.

Ah! I've just noticed that " -site:wikipedia.org" works as might be expected – anyone know how to set that as a default?

For WordPress users, here's something I hacked up earlier.

whatjapanthinks.com/wikipedia- ...

Steve Magruder [PersonRank 1]

7 years ago #

I totally agree with the ideas expressed in the post. There's no reason why MediaWiki software could not be updated to determine what links can reasonably be trusted.

Beyond this, I also agree with the earlier expressed idea that Google could easily (and I mean **easily**) ignore "nofollow" on links in the Wikipedia sites. It would certainly be a very minor tweak to their spidering/ranking software. I would even encourage Google and other search engines to work against what Wikipedia is doing. It's anti-community and anti-web.

I say all this as three-year editor on Wikipedia. The spamming isn't *that* bad.

Philipp Lenssen [PersonRank 10]

7 years ago #

Edit: grammar corrections

"with is millions of pages" -> "with its millions of pages"

"thus huge amount of outgoing links" -> "thus the huge amount of outgoing links"

[Hat tip to Nick Carr roughtype.com/archives/2007/01 ... .]

ephinos [PersonRank 0]

7 years ago #

I, too, am disappointed in Wikipedia's use of "nofollow". As a matter of fact, I am liking "nofollow" less and less on the internet as a whole. Imagine if every webmaster used "nofollow" on all links. Then what would we be left with? Search engines not following any links? That's one of the most basic ways search engines find sites--by following links. The "web" is then broken!

If a user on a forum posts a link somewhere, why is the assumption always that it is a spam link? That is what the use of "nofollow" assumes. If people are posting links somewhere, there is a possibility that it is a link to a resource of value and if so, search engines should pick up on it.

Even companies that have Wikipedia articles about them have "nofollow" on external links to their websites. A bunch of BS! A link to the Merrill Lynch corporate website won't even be followed.

I would personally like to see everybody linking to Wikipedia put "nofollow" tags on the links. Give Wikipedia and Jimbo Wales a taste of his own medicine.

Or, perhaps people who run websites that Wikipedia links to or uses as a source could remove whatever content Wikipedia derived from them.

en.wikipedia.org/wiki/User:Jim ...

I was going to post my thoughts on Jimbo's talk page (it was he who mandated that "nofollow" be implemented). Maybe I will...

I myself have spent hours over the past few months removing link spam from Wikipedia. I don't like spam any more than you guys. However, I don't think screwing with the fundamental workings of the "world wide web" by cutting the very threads that comprise the web is the right way to deal with spammers.

Perhaps some of you will give Jimbo Wales a piece of your mind.

Sebastian [PersonRank 0]

7 years ago #

It's easy to say goodbye to the link juice of a couple inbound links from Wikipedia. However, it's unacceptable that the nofollow-fiasco is getting more and more out of control. It has never worked as expected, it seeds confusion, it encourages misuse, and its ongoing semantical morphing leads to web-wide insane in the end. Here is kinda open letter to the engines asking to rethink the nofollow-debacle: sebastianx.blogspot.com/2007/0 ... (longish post, can't repeat every argument here). Thanks for listening.

This thread is locked as it's old... but you can create a new thread in the forum. 

Forum home

Advertisement

 
Blog  |  Forum     more >> Archive | Feed | Google's blogs | About
Advertisement

 

This site unofficially covers Google™ and more with some rights reserved. Join our forum!