NYT's "Spider Bites" - Google Blogoscoped Forum

Forum

NYT's "Spider Bites" (View post)
Colin Colehour	Tuesday, April 17, 2007 17 years ago • 8,014 views
This appears to be a site index for their content. I don't see a problem in having something like this to help the search engine crawl a site. This would be just like a sitemap in that sense.
Colin Colehour	17 years ago #
Most of this content is probably hidden behind a form on the main site.
Philipp Lenssen	17 years ago #
Maybe Google needs to revise this bit; "Make pages for users, not for search engines". It also doesn't work in the "nofollow" discussion, at least from their point of view (because nofollow is meant only for SEs).
Ludwik Trammer	17 years ago #
This should probably be moved into sitemap format...
Googlaxy	17 years ago #
It's the same here: http://spiderbites.about.com/ and http://spiderbites.boston.com/ ;) It's the same team!
Colin Colehour	17 years ago #
These sites should use the sitemaps protocol rather than creating all of these index pages. That way the search engines can just download the feed and users will never find index pages like these on the web.
Veky	17 years ago #
You forget that sitemaps are a pretty new idea. I get the impression these pages were made much before that.
agerhart	17 years ago #
I don't think so. The person who implemented them joined the NY Times in 2005
Kinhop	17 years ago #
Just a remark. Let's remember that the Web is not Google and Google is not the Web. Google guidelines are not Web guidelines. This kind of links are possible created for their own personal indexing. Maybe the IR tool that they are using uses a spider that starts on that specific domain. Who knows...
Marcin Sochacki (Wanted)	17 years ago #
It's strange – the screenshot in the article Philipp posted and the current page are much different. The page indeed looks like an innocent sitemap, so the question is if they changed it just now, in response to the spam rumours?
Colin Colehour	17 years ago #
marcin – The site hasn't changed, Philipp took the screenshot of the Real Estate communites index. Link below: http://spiderbites.nytimes.com/sections/realestate/re_communities_5.html
Marcin Sochacki (Wanted)	17 years ago #
Thanks Colin, so it seems that it's a regular sitemap, though the XML sitemap standard should make those obsolete.
ZZ	17 years ago #
That's downright spam..
Matt Cutts	17 years ago #
Personally, I'd prefer that it be called sitemap.nytimes.com, but if you go to www.nytimes.com, the link that points to this url has the anchor text "Site map". So it's accessible to users and search engines alike. There are categories that are useful to bots and people because the categories have names like "Free Articles: 1980s". The next level of links could be more descriptive, but they appear to be presenting articles chronologically. I'd recommend going with sitemap.nytimes.com instead of spiderbites.nytimes.com, and also to make the chronological breakdowns a little more clear, but I don't think that the site is trying to be deceptive with this. And an HTML sitemap can be useful for users and search engines; submitting a sitemap via autodiscovery would be fine, but with this html sitemap, users could surf around through free articles easily, for example. Just my $.02
Matt Cutts	17 years ago #
Hey, I just noticed that I've got a full PageRank bar! Cool! Philipp, is it based on # of posts?
Tadeusz Szewczyk	17 years ago #
The postrank (not PageRank which is of course a Google trademark ;-) is based on the number and frequency of postings AFAIK. If you don't post for a while it will be reduced. In Polish we have a saying that translated says something like: "What the boss is permitted is not for you, you shithead." Although this example is not the most obvious one I often get the impression that webmasters of renown sites can get away with murder and John Doe webmastes are punished for petty offences. This "sitemap" or should I say spidertrap is really wacky and basically worthless from a usability point of view.
Philipp Lenssen	17 years ago #
PersonRank is based on variety of secret algorithms, but yeah, it's mostly based on the number of your posts..... :) By the way, I have a hard time believing the "human readable" angle on pages like these (quote spiderbites.nytimes.com/articles/...): <<Pay Articles 1990's 199912 199912_2 199912_3 199912_4 199912_5 199912_6 199912_7 199911 199911_2 199911_3 199911_4 199911_5 199911_6 199911_7 199910 199910_2 199910_3 199910_4 199910_5 199910_6 199910_7 199909 etc.>> "199911_2" for example turns out to mean "Pay Articles 11/1999 Links". This is not in any way human readable, so it's likely indeed just what the domain name says: fodder for search engine spiders. Considering that the domain spiderbites.nytimes.com is a PageRank 8 page, it might also be much more effective than the Sitemaps XML format in transferring additional juice to the sub-pages.
Yvo	17 years ago #
The sitemap sub-domain has: 9,612,435 inlinks according to yahoo, and thus SERP #3 on keyword "site map" in google.
Danny Sullivan	17 years ago #
NYT has had these for some time, to my understanding. I think SEW Forums had a discussion about them ages ago, or they might have been similar ones that About.com does. They do, as they did then, seem to be a way for search engines to spider individual pages. We've only had all the search engines support XML sitemaps for less than a week now. Until last week, only Google and Yahoo actively were able to accept them (Microsoft supported them in concept, but without autodiscovery, there was no way to provide them). Even with XML sitemaps, I'm sure plenty of big sites will stil feel it is worthwhile to have actual HTML sitemaps.
Matt Cutts	17 years ago #
That's correct, Danny; this sitemap has been around for years. I've known about this particular sitemap since 2005, and dug into it back then. Before that, I dug into the sitemap on about.com because of the same name as well. But despite the poor name, an on-site HTML sitemap allows many of these articles to be crawled with PageRank. Philipp, I agree that the links could be more descriptive, but if a reader wanted to look up an article from a particular date in 1985, the current sitemap would let you get it down to 6-7 pages to check out (because the articles are listed by year and month). That's pretty useful. That said, I agree that the pages could be made more descriptive, and the subdomain would be more accurate as sitemap.nytimes.com. But I don't believe that there was intent to spam in this case.
Colin Colehour	17 years ago #
Either way, the user would benefit from this. Older content that is normally only found by using a search form on the site, can now be found because the search engines can crawl through these indexes. So if they allow the search engines access to years worth of content than I'm all for it.
Tadeusz Szewczyk	17 years ago #
Of course PersonRank! I forgot!
Philipp Lenssen	17 years ago #
> Either way, the user would benefit from this. Older > content that is normally only found by using a search > form on the site, can now be found because the search > engines can crawl through these indexes. Not to say this isn't true, but that's the argument blackhat optimizers like those who did doorway pages for BMW.de also often use. "We actually help the user by building these pages for search engines..." I also saw the argument pop up in the recent paid links debate: "paid links campaigns help users find the relevant content because they push it up in Google..." Nevertheless, Google's webmaster guidelines are what they are, and right or wrong at this time they say don't make pages just for search engines...
Colin Colehour	17 years ago #
I think the spider bites index is good for users in the sense that content that normally would not have been added to the search engine can now be searched and indexed. But I can totally see how someone scamming Google would use an excuse like its for the users benefit.

Forum home

>> More posts

Blog | Forum more >> Archive | Feed | Google's blogs | About

This site unofficially covers Google™ and more with some rights reserved. Join our forum!