Google Blogoscoped

Forum

Digital Inspiration Gets Labeled as a ‘Content Farm’

WebSonic.nl [PersonRank 10]

Sunday, February 27, 2011
13 years ago12,963 views

http://www.labnol.org/internet/blog-as-content-farm/18750/

TOMHTML [PersonRank 10]

13 years ago #

What a nice fail! Well done Google...

Roger Browne [PersonRank 10]

13 years ago #

[put at-character here]TOMHTML: What criteria would you use to separate Digital Inspiration's "How-to" articles from, say, eHow's "How-to" articles?

WebSonic.nl [PersonRank 10]

13 years ago #

Besides everything I don't see Digital Inspiration as a content farm, or a website that would be affected by the update, right? In a quick look at the list on Search Engine Land I noticed that eHow.com is a winner with this update.

Roger Browne [PersonRank 10]

13 years ago #

Oh well, no algorithm will evaluate all sites perfectly. I was surprised to see eHow being a winner, and also surprised to see Yahoo Answers being a winner.

TOMHTML [PersonRank 10]

13 years ago #

[put at-character here]Roger: who said you need to separate them? I don't think all how-to articles are bad...

Roger Browne [PersonRank 10]

13 years ago #

There's plenty of shallow content at eHow. For example, "How to save money on your electricity bill: When you leave a room, turn off the light..."
http://www.ehow.com/how_2064255_save-money-electric-bills.html

I got the impression that Google wants to de-emphasize shallow content, and wondered if there was an algorithm or test that would somehow distinguish eHow's content from that of Digital Inspiration, given your implied claim that Digital Inspiration content should not be targeted by Google's "Farmer" update.

It makes me wonder what Google is really targeting here, given that WiseGeek dropped greatly, and I would rate WiseGeek's articles as being (on the whole) less shallow than those of eHow.

TOMHTML [PersonRank 10]

13 years ago #

Even with your example, I still don't think it's bad content. Otherwise Google should ban every WWF advices, or books for kids.

Pages generated with content spinning ARE bad content. And sadly many of them are still up in the SERPs.

Roger Browne [PersonRank 10]

13 years ago #

Content spinning (where words in someone else's content are replaced by similar words extracted from a thesaurus) is completely worthless, unlike shallow content which is only 95% worthless.

But Google has been able to detect content spinning for some years now, even though it's not always fully penalised. I guess that's because no single ranking factor can be allowed to have a 100% influence on the results, otherwise SEOs will exploit it somehow (for example by posting content-spun comments to their competitor's blog).

Content farms, which are apparently the target of this latest update, don't do content spinning. They aim to churn out many pages at very low cost, by paying people to write the content. But the articles are often shallow paraphrases of the blandest content already available elsewhere on the web. The content farms don't care, provided the articles have the trending and high-paying keywords – or at least the content farms didn't care until now because the business was highly profitable.

My interest is in how Google can distinguish between low-quality content farms and higher-quality articles sites. We see that WiseGeek has apparently done very badly out of this update. They are a content farm for sure, but in my opinion one of the better ones.

TOMHTML [PersonRank 10]

13 years ago #

<< But Google has been able to detect content spinning for some years now, >>
I don't think so. Or only in the early years, when content wasn't grammatically correct.

Roger, who cares about eHow-like websites generates thousands of pages, SEO excepted? Imagine I'm John Doe, I search for "how to clean my bathoom?", if eHow's page answers correctly my question all is OK, where is the problem? This site spams Google and your website is down in the SERPs? While it answer correctly my question, I don't give a fuck!!!

Roger Browne [PersonRank 10]

13 years ago #

Google has been able to reliably detect content spinning since at least 2006 when they released their data set that analyzed the frequency of 5-word n-grams (word sequences) on the web.

I'm sure Google kept a data set of 6-word sequences for their own internal use :)

When John Doe searches for "how to clean my bathroom", he is unlikely to be searching for the bland "correct but useless" information provided by a shallow content farm. But that's how the content farm wants it, because they want John Doe to find his answer in the ads, and click on them instead.

Google, the search engine, prefers to take John Doe to content that is actually useful to him (even though Google, the AdSense purveyor, might prefer to send John Doe to the content farm).

Roger Browne [PersonRank 10]

13 years ago #

References for the n-grams:

Google Research Blog: All Our N-gram are Belong to You
http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html

Some discussion of N-grams and splogs here:
http://blogoscoped.com/forum/147276.html

The word "splogs" (spam blogs) seems to have fallen out of use, as Google has effectively got rid of most of them, but the typical 2006 splog was pure content-spinning.

TOMHTML [PersonRank 10]

13 years ago #

<<When John Doe searches for "how to clean my bathroom", he is unlikely to be searching for the bland "correct but useless" information provided by a shallow content farm.>>
How can you say it's useless? That might be useless for you but not for me or John Doe, otherwise none of use would search that kind of query.
How can you tell it's a "content farm"? If you have never seen the site before, you can't.

For me, this could be also considered as low-quality and useless content:
http://www.labnol.org/internet/google-gandhi-logo/10171/
http://www.labnol.org/internet/google-frozen-slush/5011/
http://www.labnol.org/internet/youtube-playback-speed/18711/
http://www.labnol.org/software/program-accessing-internet/18476/
http://www.labnol.org/home/reuse-old-cds/13660/
http://www.labnol.org/software/protect-home-wifi/13675/
etc.

Roger Browne [PersonRank 10]

13 years ago #

[put at-character here]TOMHTML, it's pretty obvious to me which is the useless bland shallow content. Here are some examples:

How to To Keep Your Bathroom Looks Clean. My TIPS
http://www.ehow.com/how_4751304_keep-bathroom-looks-clean-tips.html

"Here are some tips to maintain a clean bathroom ... Clean your bathroom at least once a week or every two weeks ... keep the bathtub clean by cleaning it every time you use it ..."

How can I clean my bathroom top to bottom...
http://www.blurtit.com/q534337.html

"First you have to get out a pail some rags, rubber gloves and a scrub brush ... i like to use comet or Ajax ... it this stuff doesn't work my only advice is to rip out all the bad and totally replace every thing, toilet tub/shower, tile..."

The following article is obviously made-for-AdSense rather than made-for-John-Doe:

What Products Do I Need To Clean My Bathroom?
http://www.wisegeek.com/what-products-do-i-need-to-clean-my-bathroom.htm

"... Glass cleaner is perfect if you have a glass shower door..."

Wow, I never would have thought of using glass cleaner if I had a glass shower door!

TOMHTML [PersonRank 10]

13 years ago #

Hum, good pick for that example.
But articles like that http://www.ehow.com/how_7849608_stop-outdoor-cat-peeing-sandbox.html or that http://www.ehow.com/how_5528756_make-easy-homemade-french-fries.html don't look as MFA, IMHO.

Once again, it's not because some words and expressions are repeated that you can conclude it's low-quality content. Otherwise even Wikipedia's simple English version should be banned.

Ionut Alex. Chitu [PersonRank 10]

13 years ago #

Google can't detect low-quality content because quality is subjective. Google can't even distinguish an original article from a post that quotes the article. I've searched for the title of an article from AllThingsD and the top result was a page from a news aggregator.

Here's Google's claim from last week (http://googleblog.blogspot.com/2011/02/finding-more-high-quality-sites-in.html):

<<This update is designed to reduce rankings for low-quality sites—sites which are low-value add for users, copy content from other websites or sites that are just not very useful. At the same time, it will provide better rankings for high-quality sites—sites with original content and information such as research, in-depth reports, thoughtful analysis and so on.>>

This quote shows that Google uses metrics that evaluate the quality of a site. If most of the articles are short, copy content from other sites or there are very few complete sentences, Google could artificially lower the site's rankings. If there are many pages for intermediate/advanced reading levels, there's a lot of unique content, many references to authoritative resources, links to/from scientific papers, long paragraphs quoted by other authoritative sites, then Google could artificially boost the rankings. This sounds good in theory, but not many people want to read scientific papers and original research. For some people, a short post that summarizes a scientific paper in plain English is more valuable.

Philipp Lenssen [PersonRank 10]

13 years ago #

> it's pretty obvious to me which is the
> useless bland shallow content. Here are some examples:

Roger, that bathroom tips page didn't look that bad to me... maybe not terrific, but also not truly bad or scammy or anything (the intentions are probably to make money from ads, yeah, but that's another issue). But isn't that what Google's plain old backlinks algo is good for, to basically let the public decide – through "links as votes" – whether or not the quality is OK? If people link to that site, then they apparently like it, and if they don't link, it won't get high ranks. If that is the case and it doesn't work, then isn't the real issue to decide what valuable backlinks are and what are "big partner networks linking their own stuff" kind of backlinks (which don't equal quality)?

And re: content spinning... I also came across sites which translate copied content into another language, then translate it back, both of course automatically, and the result being half-way meaningful looking garbage (a human would immediately notice, some indexing bots may not, though I suppose Google may well be smart enough to determine it's not real spoken and correct language).

Roger Browne [PersonRank 10]

13 years ago #

Google has all the technology they need to detect things like content which has been translated into another language and back again.

The difficult thing is to decide what weighting to give to each ranking factor. I think Google is now at the point where they are using most of the ranking factors that are reasonably available.

Of course future technology will help out here. When Google implements "house view", they can simply check the bathrooms of people who visited various "How Do I Clean My Bathroom" pages. The cleanest bathrooms will indicate the pages that should be highest-ranked.

(I'm not adding a smiley, because I think this kind of thing WILL happen.)

TOMHTML [PersonRank 10]

13 years ago #

Well, this debate shouldn't be about "quality" of the results, but relevance. Google is still very good at relevance.

Some people wonder if those attacks from NYT (which cause Google to tweak its algorithm many times in the last months) are directed by Microsoft or even Blecko, which have less relevant results (according to me) but in higher quality (still according to me).

Forum home

Advertisement

 
Blog  |  Forum     more >> Archive | Feed | Google's blogs | About
Advertisement

 

This site unofficially covers Google™ and more with some rights reserved. Join our forum!