Google Blogoscoped

Forum

Yahoo Asks You to Do Their Work (Again)  (View post)

noname [PersonRank 4]

Tuesday, August 21, 2007
16 years ago4,547 views

some engines support parameters in robots.txt by the way (what is imho better way). Finding duplicities is very easy for robot, so i wander why they implement this. Noone will anyway use it (0,0001% pages max)

John Honeck [PersonRank 10]

16 years ago #

Giving you the ability to have some control over how your own site is treated is not asking you to do their work. If you don't care how they treat your site you don't have to do anything.

It's at least a step in the right direction to give the webmaster some control or choice in the matter. Google allows you to choose the preferred domain, but that's about it, the rest of their tools are so rarely updated they are next to worthless.

Niraj Sanghvi [PersonRank 10]

16 years ago #

John,

I'm all for more flexibility, and in that regard I would agree with you. But that's not the angle they've put on this feature. In fact they've almost implied that without this information from you they will end up with lots of duplicate content, you will risk getting some content not indexed at all because of capacity issues, their crawler will get stuck in your site, etc.

To me this is a lot of smoke and mirrors to hide the fact that they either cannot or choose not to determine where duplicate content is appearing (it should be a simple thing to check and see if the same url is returning the same content even as parameters are different). They are also saying the quality and quantity of what gets indexed on your site is affected based on your identifying the dynamic parameters.

While flexibility may be a side-benefit of all this, it's clearly not their reason for pushing this. See their list of benefits below, and try picturing how many of these things they could achieve even if they didn't get the user input they're looking for. I think at least 4 of their "benefits" could be obtained by making their crawler smarter.

>>So you might wonder what the feature really gives you. Utilizing the 'Dynamic URL Rewriting' feature enables:

   * A more efficient crawl of your site, with fewer duplicate URLs being crawled.
   * Better and deeper site coverage, as we'll be able to use our crawler capacity to find and index more new content on your site.
   * More unique content discovered, as we'll handle more dynamic parameters in your URLs (if you remove the content-neutral dynamic parameters).
   * Fewer chances of crawler traps.
   * Cleaner and easier-to-read URLs displayed in the search results.
   * Better aggregation of link juice to your sites, which can help your pages rank better.

John Honeck [PersonRank 10]

16 years ago #

I think their point is that is with the benefits they list is that you are actively reducing the amount of crawler time needed for your site and in the aggregate all sites.

Covering the bullet points:

1) Even if they do make their system smart enough to determine that a bunch of different URLs actually show the same content, they still have to crawl all of those URLs to make that determination.

2 &3) By stopping that crawling activity before it starts in a large scale allows the crawler to waste less time on duplicates and more time finding unique stuff.

4) Crawler traps are the responsibilty of the webmaster not the search engine, if your CMS builds an infinite amount of links, that should be stopped at the site level.

5) who reads urls?

6) Consolidation of link juice is a good thing, most people do it manually on their own site, but if you can't then Yahoo gives you the opportunity to do in their interface.

Now if they could only send some actual visitors all of this stuff may matter some day!

Niraj Sanghvi [PersonRank 10]

16 years ago #

Haha, I'm definitely with you on that last point!

I really wonder how many people would understand and take advantage of something like this anyways. And relatedly, I wonder how many people participate in things like sitemaps. It would seem to me that it's likely a very small percentage of users, but it must be worthwhile for them to be building such tools I suppose.

unix [PersonRank 0]

16 years ago #

In a command, what was changet: STDIN, STDOUT, or STDERR ?
   cat memo
   cat> memo
   Is > listing
   mailx user11 < memo
cat memo > memo.bk
   cat memo > memo.bk 2> error
cat memo > memo.bk 2>& 1
  

Matt Cutts [PersonRank 10]

16 years ago #

Interesting perspective. I think Google does a pretty good job of canonicalizing urls without asking site owners for help, but I think it's nice of Yahoo to give site owners this extra option.

Philipp Lenssen [PersonRank 10]

16 years ago #

> but I think it's nice of Yahoo to give site owners this extra option.

Joel Spolsky describes this so wonderfully. I'm not saying this particular Yahoo offer is that bad. Just something to think about, if we think about webmasters = users:

<<*Every time you provide an option, you're asking the user to make a decision.*

Asking the user to make a decision isn't in *itself* a bad thing. Freedom of choice can be wonderful. People love to order espresso-based beverages at Starbucks because they get to make so many *choices*. Grande-half-caf-skim-mocha-Valencia-with-whip. Extra hot!

The problem comes when you ask them to make a choice that *they don't care about.* (...)

They don't care about a lot of things, and it is the designers' responsibility to make these choices for them so that they don't have to. It is the height of arrogance for a software designer to inflict a choice like this [not referring to the current Yahoo case but something else, though interestingly enough it's also about search indexing] on the user simply because the designer couldn't think hard enough to decide which option is really better.>>
- User Interface Design for Programmers, Chapter 3: Choices
http://www.joelonsoftware.com/uibook/chapters/fog0000000059.html

Also see: Choices = headaches
http://www.joelonsoftware.com/items/2006/11/21.html

John Honeck [PersonRank 10]

16 years ago #

Niraj Sanghvi, I'd heard that over a million people were using sitemaps, but I can't back that up with any proof or link, it's just a number that sticks in my head.

RE: choices, simplicity in googles search interface was one of the factors that made it popular and a valid point for designing for an average user, but what we are dealing with here is on the backend for "supposedly" higher end users, they will enjoy the ability to make choices.

I wouldn't give my mom a ubuntu machine, she works just fine with windows out of the box.

Philipp Lenssen [PersonRank 10]

16 years ago #

> but what we are dealing with here is on the backend
> for "supposedly" higher end users, they will enjoy the ability
> to make choices.

I enjoy tuning my websites, but I don't enjoy tuning Yahoo (or Google, for that matter). I think that would also be the wrong place to tune something, because you'd have to do it for each search engine individually. If you think some parts of your site have too many parameters why not use htaccess and fix it for everyone (including human visitors because you've made the URL nicer to look at and remember)?

John Honeck [PersonRank 10]

16 years ago #

Philip, I agree completely, it SHOULD be done on the site level, for every search engine, visitor etc. After all most links start out with a copy of the browser bar.

Unfortunately a lot of people buy hosting, bad shopping carts, pre-made sites, etc that don't allow customization or they simply don't know how.

Lars [PersonRank 1]

16 years ago #

I think this is a step in the right direction. Search engines cannot understand proprietary parameters unless you tell them what matters to them.

Paul Fisher [PersonRank 1]

16 years ago #

I think the URL issue is something that should be addressed on the CMS level — even if there's a way to alert a crawler about it, it tends to mess up visited vs. unvisited links for a person browsing, an important indication of where you've been.

Philipp Lenssen [PersonRank 10]

16 years ago #

And now Google posts their solutions:
http://googlewebmastercentral.blogspot.com/2007/09/google-duplicate-content-caused-by-url.html

Forum home

Advertisement

 
Blog  |  Forum     more >> Archive | Feed | Google's blogs | About
Advertisement

 

This site unofficially covers Google™ and more with some rights reserved. Join our forum!