Google Blogoscoped

Forum

NYT's "Spider Bites"  (View post)

Colin Colehour [PersonRank 10]

Tuesday, April 17, 2007
13 years ago7,040 views

This appears to be a site index for their content. I don't see a problem in having something like this to help the search engine crawl a site. This would be just like a sitemap in that sense.

Colin Colehour [PersonRank 10]

13 years ago #

Most of this content is probably hidden behind a form on the main site.

Philipp Lenssen [PersonRank 10]

13 years ago #

Maybe Google needs to revise this bit; "Make pages for users, not for search engines".
It also doesn't work in the "nofollow" discussion, at least from their point of view (because nofollow is meant only for SEs).

Ludwik Trammer [PersonRank 10]

13 years ago #

This should probably be moved into sitemap format...

Googlaxy [PersonRank 1]

13 years ago #

It's the same here: spiderbites.about.com/ and spiderbites.boston.com/ ;) It's the same team!

Colin Colehour [PersonRank 10]

13 years ago #

These sites should use the sitemaps protocol rather than creating all of these index pages. That way the search engines can just download the feed and users will never find index pages like these on the web.

Veky [PersonRank 10]

13 years ago #

You forget that sitemaps are a pretty new idea. I get the impression these pages were made much before that.

agerhart [PersonRank 1]

13 years ago #

I don't think so. The person who implemented them joined the NY Times in 2005

Kinhop [PersonRank 1]

13 years ago #

Just a remark. Let's remember that the Web is not Google and Google is not the Web. Google guidelines are not Web guidelines.

This kind of links are possible created for their own personal indexing. Maybe the IR tool that they are using uses a spider that starts on that specific domain. Who knows...

Marcin Sochacki (Wanted) [PersonRank 10]

13 years ago #

It's strange – the screenshot in the article Philipp posted and the current page are much different.

The page indeed looks like an innocent sitemap, so the question is if they changed it just now, in response to the spam rumours?

Colin Colehour [PersonRank 10]

13 years ago #

[put at-character here]marcin – The site hasn't changed, Philipp took the screenshot of the Real Estate communites index. Link below:

spiderbites.nytimes.com/sectio ...

Marcin Sochacki (Wanted) [PersonRank 10]

13 years ago #

Thanks Colin, so it seems that it's a regular sitemap, though the XML sitemap standard should make those obsolete.

ZZ [PersonRank 3]

13 years ago #

That's downright spam..

Matt Cutts [PersonRank 10]

13 years ago #

Personally, I'd prefer that it be called sitemap.nytimes.com, but if you go to www.nytimes.com, the link that points to this url has the anchor text "Site map". So it's accessible to users and search engines alike. There are categories that are useful to bots and people because the categories have names like "Free Articles: 1980s". The next level of links could be more descriptive, but they appear to be presenting articles chronologically.

I'd recommend going with sitemap.nytimes.com instead of spiderbites.nytimes.com, and also to make the chronological breakdowns a little more clear, but I don't think that the site is trying to be deceptive with this. And an HTML sitemap can be useful for users and search engines; submitting a sitemap via autodiscovery would be fine, but with this html sitemap, users could surf around through free articles easily, for example.

Just my $.02

Matt Cutts [PersonRank 10]

13 years ago #

Hey, I just noticed that I've got a full PageRank bar! Cool! Philipp, is it based on # of posts?

Tadeusz Szewczyk [PersonRank 10]

13 years ago #

The postrank (not PageRank which is of course a Google trademark ;-) is based on the number and frequency of postings AFAIK. If you don't post for a while it will be reduced.

In Polish we have a saying that translated says something like: "What the boss is permitted is not for you, you shithead."

Although this example is not the most obvious one I often get the impression that webmasters of renown sites can get away with murder and John Doe webmastes are punished for petty offences.

This "sitemap" or should I say spidertrap is really wacky and basically worthless from a usability point of view.

Philipp Lenssen [PersonRank 10]

13 years ago #

PersonRank is based on variety of secret algorithms, but yeah, it's mostly based on the number of your posts..... :)

By the way, I have a hard time believing the "human readable" angle on pages like these (quote spiderbites.nytimes.com/articles/...):

<<Pay Articles 1990's

199912
199912_2
199912_3
199912_4
199912_5
199912_6
199912_7
199911
199911_2
199911_3
199911_4
199911_5
199911_6
199911_7
199910
199910_2
199910_3
199910_4
199910_5
199910_6
199910_7
199909
etc.>>

"199911_2" for example turns out to mean "Pay Articles 11/1999 Links". This is not in any way human readable, so it's likely indeed just what the domain name says: fodder for search engine spiders. Considering that the domain spiderbites.nytimes.com is a PageRank 8 page, it might also be much more effective than the Sitemaps XML format in transferring additional juice to the sub-pages.

Yvo [PersonRank 1]

13 years ago #

The sitemap sub-domain has: 9,612,435 inlinks according to yahoo, and thus SERP #3 on keyword "site map" in google.

Danny Sullivan [PersonRank 2]

13 years ago #

NYT has had these for some time, to my understanding. I think SEW Forums had a discussion about them ages ago, or they might have been similar ones that About.com does. They do, as they did then, seem to be a way for search engines to spider individual pages. We've only had all the search engines support XML sitemaps for less than a week now. Until last week, only Google and Yahoo actively were able to accept them (Microsoft supported them in concept, but without autodiscovery, there was no way to provide them). Even with XML sitemaps, I'm sure plenty of big sites will stil feel it is worthwhile to have actual HTML sitemaps.

Matt Cutts [PersonRank 10]

13 years ago #

That's correct, Danny; this sitemap has been around for years. I've known about this particular sitemap since 2005, and dug into it back then. Before that, I dug into the sitemap on about.com because of the same name as well. But despite the poor name, an on-site HTML sitemap allows many of these articles to be crawled with PageRank.

Philipp, I agree that the links could be more descriptive, but if a reader wanted to look up an article from a particular date in 1985, the current sitemap would let you get it down to 6-7 pages to check out (because the articles are listed by year and month). That's pretty useful.

That said, I agree that the pages could be made more descriptive, and the subdomain would be more accurate as sitemap.nytimes.com. But I don't believe that there was intent to spam in this case.

Colin Colehour [PersonRank 10]

13 years ago #

Either way, the user would benefit from this. Older content that is normally only found by using a search form on the site, can now be found because the search engines can crawl through these indexes. So if they allow the search engines access to years worth of content than I'm all for it.

Tadeusz Szewczyk [PersonRank 10]

13 years ago #

Of course PersonRank! I forgot!

Philipp Lenssen [PersonRank 10]

13 years ago #

> Either way, the user would benefit from this. Older
> content that is normally only found by using a search
> form on the site, can now be found because the search
> engines can crawl through these indexes.

Not to say this isn't true, but that's the argument blackhat optimizers like those who did doorway pages for BMW.de also often use. "We actually help the user by building these pages for search engines..." I also saw the argument pop up in the recent paid links debate: "paid links campaigns help users find the relevant content because they push it up in Google..." Nevertheless, Google's webmaster guidelines are what they are, and right or wrong at this time they say don't make pages just for search engines...

Colin Colehour [PersonRank 10]

13 years ago #

I think the spider bites index is good for users in the sense that content that normally would not have been added to the search engine can now be searched and indexed. But I can totally see how someone scamming Google would use an excuse like its for the users benefit.

This thread is locked as it's old... but you can create a new thread in the forum. 

Forum home

Advertisement

 
Blog  |  Forum     more >> Archive | Feed | Google's blogs | About
Advertisement

 

This site unofficially covers Google™ and more with some rights reserved. Join our forum!