Google Allows Misrepresented Result Snippets

Forum

Google Allows Misrepresented Result Snippets (View post)
Bob Gladstein	Tuesday, February 20, 2007 19 years ago • 6,246 views
Same deal with the NY Times. Some pages require a paid subscription, some require a login, and some require a login after a certain amount of time, but because googlebot is always given access as if it had a paid subscription, it doesn't see the registration/login page. Personally, I consider this a form of cloaking, but there was a discussion about it on the SEW forum last year, and my view wasn't shared by many people there.
Michael Martinez	19 years ago #
They've also been doing this with academic resource archives. It's a practice Google should abandon as it clearly does not serve the best interests of their users.
MarkZZ	19 years ago #
You can always change your useragent to googlebot, to avoid the WMW registration pages :)
John Honeck	19 years ago #
We'll get to the bottom of this sooner or later! Thanks for the mention. To reiterate, my biggest objection to the practice is the sales pitch on the signup page. In essence they are using Google's free snippet as a preview of the content and then give the user a sales pitch to upsell them to a paid subscription. Even if it wasn't a paid subscription it's a great way to harvest names which can also be used for profit. I get the feeling based on Matts post and other discussions I've seen that those who feel it's okay because there is a way to access the content for free. What I don't understand is how this is judged? Is there a quality score on the signup page? Is it fine for me to have a signup page with 12 feet of sales pitch and a small well designed sign-up link at the bottom? To me the rules should be applied equally across the board, the snippet and the content ranking should be made from the landing page. If the landing page is a sign-up sheet than let them add the content there. On the other hand, having a [subscription required] tag like in googlenews would give the user fair warning, and we could all benefit from it's usage to generate our targeted mass email lists and sell free subscriptions for $10/year to people not savvy enough to scroll to the bottom.
Martin Terre Blanche	19 years ago #
Further to Michael Martinez's comment, I also find this extremely frustrating. Searches in scholar.google.com are based on full documents (e.g., pdfs of journal articles), but 9 times out of 10 only the abstract is displayed and access to the full document requires purchasing it from one of the large academic publishing houses. I am not advocating that Google Scholar should base their index only on those parts of academic documents (usually the title and the abstract) that are freely displayed, but that they should clearly show in search results which links point to documents that are not freely available. As far as I know, currently the only way of making sure that your search won't be based on full but non-free documents is to use the "intitle:" operator. There are no "inabstract:" or "free-only": or similar operators.
Tony Ruscoe	19 years ago #
<< You can always change your useragent to googlebot, to avoid the WMW registration pages :) >> When I tried this a long time ago it didn't work.
David Yin	19 years ago #
Google does not hate registration page. It just hates the webmaster who show real user and googlebot different page. It means cheating.
George R	19 years ago #
Often the search result also has a link to the google cache. Sometimes the cache matches the snippet and sometimes it doesn't. If there is a cache entry that matches the snippet, then a copy of a portion of the source is available via that route. If there is a public cache entry that doesn't match the snippet, then in some sense google knows about the mismatch. On a related issue I would prefer having a stale cache with useful data than an updated cache that has the useful data removed or replaced with a registration page.
Ionut Alex. Chitu	19 years ago #
<<Often the search result also has a link to the google cache.> That's not the case with WebmasterWorld.
George R	19 years ago #
I just looked at the robots.txt file at WebmasterWorld. http://www.webmasterworld.com/robots.txt User-agent: * Disallow: / It then has an entire blog, "THE BOT BLOG", embedded in the comment fields of the robot.txt file.
JohnMu	19 years ago #
You only get that robots.txt when you're not recognized as being a real bot. They're even cloaking(*) the robots.txt – sneaky :-).
NickA	19 years ago #
It's really unfortunate that anyone (especially search engine representatives) continues to post to WebmasterWorld instead of an open forum. Webmasterworld.com is easily the most deficient forum site I've encountered.
Androw	19 years ago #
Here's a question: I have a site which has content that publishers do not allow on the free web. However, a simple registration is all that's required to read it. I'm thinking about allowing Googlebot to spider the entire page (right now it only sees what users see), because the content is there, and only requires simple reg. Otherwise the content would not be on the web at all. What do you guys think? Bad practice?
John Honeck	19 years ago #
That depends...do you sponsor conferences and have google employees speak at them? If so you can do anything you want...if not, then you'll have to play by the rules.
Philipp Lenssen	19 years ago #
Androw, make sure you follow the precise technical implementation of WebmasterWorld. This should be safe, because WMV isn't banned.
Josue R.	19 years ago #
I'm thinking (as others have) that WMW would trick the page into showing results to search engines if the crawler-bot & IP match their "search engine list".... So how about Matt try using Tor inside the Google network and see if they get different results from WMW. Maybe something may stand out ;)
Keniki	19 years ago #
Great article all sites should be equal and some sites should definately not be more equal the others.
Keniki	19 years ago #
I think webmasterworld more than explained there position now. I think its a n authority site and perhaps the focuss should now go on people who scrape and copy content.

Forum home

>> More posts

Blog | Forum more >> Archive | Feed | Google's blogs | About

This site unofficially covers Google™ and more with some rights reserved. Join our forum!