Google Blogoscoped

Forum

Google's Matt Cutts on Duplicate Content and More  (View post)

Seth Finkelstein [PersonRank 10]

Wednesday, August 2, 2006
17 years ago6,669 views

"to the best of my knowledge there is no tag that could just say, “I am porn, please exclude me from your SafeSearch.”

Huh? Am I missing something? Matt, if you're reading this thread, are you aware of the ICRA labelling stuff? There's the old "SafeSurf" standard too.

[Great transcription, Philipp]

Niraj Sanghvi [PersonRank 10]

17 years ago #

I haven't seen the videos and maybe this question is answered there:

How does Google deal with inadvertant duplicate content? I remember there was a whole fiasco about canonical domain names and how being able to access your site at both http://site.com and http://www.site.com was a bad thing. But it seems like plenty of people don't know that. Likewise, to be able to go to site.com, site.com/, and site.com/index.php.

Do these URLs trigger duplicate content issues? I remember all of them have been discussed at various times (often by Matt on his blog), but things were always changing in how they are handled since not many people are *both* aware of *and* actively correcting the issues.

Niraj Sanghvi [PersonRank 10]

17 years ago #

Note: In the first example I was saying site.com and www.site.com. One with a www and one without. The linked text in the post looks the same, but the links themselves are not the same.

/pd [PersonRank 10]

17 years ago #

"There’s always a bank of machines refining the PageRank based on incoming data. And PageRank goes out all the time – anytime there’s a new update to our index, which happens pretty much every day"

Who told me on this forum that PR was dead ???? See PR is always being tweaked.. its the heart of their algo's!!

TOMHTML [PersonRank 10]

17 years ago #

Thanks for this part of transcript, Philipp.
Pretty cool.

Now I'm searching for the rest :) I have tried to do that but really I'm not able to understand 30% of the words :-(

Ionut Alex. Chitu [PersonRank 10]

17 years ago #

I still don't get it what's the added value of the videos. OK, they're cool, but the info is not crawlable and can't become a part of world information accessible through Google and other search engines. Unless, of course, a speech-to-text would create transcripts.

Matt Cutts [PersonRank 10]

17 years ago #

continuously and continually, Philipp. They're different, although most people get them mixed up.

Ionut, have you read that transcript? It's huge! All I have to do is talk for 5 minutes, which is much much easier. :) Plus then Philipp gets the people visiting to read. It's a win win win...

Seth, I answered this over on my blog. Back when I wrote it, not many porn sites used the tags. You'd also get some false positives (people would copy a porn site for the template, believe it or not, and keep the ratings). So it really didn't add anything to support ICRA or RSACi type stuff.

Philipp Lenssen [PersonRank 10]

17 years ago #

> The linked text in the post looks the same, but the
> links themselves are not the same.

Ooops. I once tweaked the forum software to turn every URL into the shortest possible version. That wasn't helpful this time.

Ionut Alex. Chitu [PersonRank 10]

17 years ago #

I always prefer without-www version.

Michael Rima [PersonRank 0]

17 years ago #

Duplicate content filters!!!???
Here below a proof that it doesn’t hurt to much or even better it looks like its rewarding.

Matt quoted this below:
So if one page looks exactly the same as another page, that can be quite helpful.
Here a prove of a site(s) that is using all the same content the same design and they even say it on there site that they are the same so there cant be any misunderstanding about it.

I Found this site that makes a view hundred thousand $ a day only because of there duplicate content approach.
Its create them millions of backward links and pages and that gives them great rankings

Take a look at the following keywords in Google:
Barcelona Hotels
Paris hotels
London hotels
Prague hotels
Milan hotels
Rome hotels
etc

What you will find there is a view times the same listing, all from the same sites(bookings) only then under a different extensions but all under the same brand and same ip some times also under different domains but then still 100% the same content and the same ip.

hereunder list of a view sites they own:
www.bookings.it
www.bookings.nl
www.booking.com
www.booking.org
www.booking.co.jp
www.bookings.fr
www.bookings.be
www.bookings.es
also all extensions under booking.

This is going on quite a while about a year and a half.

The conclusion is that or there is no such thing as duplicate content filters or that Google likes them to much because of there huge spending in adwords.

TOMHTML [PersonRank 10]

17 years ago #

Matt Cutts is a real hostage!!!
http://video.google.fr/videosearch?q=label%3Amattcutts

stefan2904 [PersonRank 10]

17 years ago #

lol, funny pic -(:

TOMHTML [PersonRank 10]

17 years ago #

Just a screenshoot, for memories, if the results change :

http://img145.imageshack.us/img145/4418/mattcuttshostageec9.jpg

stefan2904 [PersonRank 10]

17 years ago #

i´ve watched the video now. very funny :~p

Philipp Lenssen [PersonRank 10]

17 years ago #

You can also search for "hostage":
http://video.google.fr/videosearch?q=label%3Ahostage

After all, everyone can add labels...

GoogleCensors SmallBiz [PersonRank 0]

17 years ago #

[Personal attack removed]

"If your site participates in an affiliate program, make sure that your site adds value. Provide unique and relevant content that gives users a reason to visit your site first. "

with

http://www.google.com/search?q=cheap+p-removethis-hentermine&btnG=Search

Your scraper site is made for adwords and is full of duplicate content of bad spam neighborhoods. Remove google from the google index for failing to follow quality guidelines.

Bob Hoskins [PersonRank 0]

17 years ago #

"It’s better for your users and it’s better for search engines to probably just take those links out, put them somewhere in [*] the sitemap."

I think they're saying that you should have a sitemap page (can be a hidden link that only the robot sees if need be) that links to any content you have to hack to get google to index explicitly, making google index it, but removing the need for wierd code. Everyone is happy :)

Forum home

Advertisement

 
Blog  |  Forum     more >> Archive | Feed | Google's blogs | About
Advertisement

 

This site unofficially covers Google™ and more with some rights reserved. Join our forum!