Friday, September 23, 2005

Creating a Good Blog Archive

I would like to talk about some blogging approaches I have here on Google Blogoscoped to help make sure, years from now, this blog makes a good archive – maybe it inspires a thing or two on your own blog!

1. Making screenshots

I often illustrate my posts with screenshots of sites I’m linking to. I’m doing this not only so the post is easy to recognize when you scroll a longer page, but also so that my blog will make a good archive in years to come – when quite possibly, a good portion of the sites I’ve linked to ceased to exist (or changed so much that it’s interesting to see what the site looked like years ago). When I make a screenshot, I don’t even think much about how interesting this thing looks to me now – for example, it turns out even old banners can be interesting.

2. Not hot-linking to images

I try to avoid to include images from other sites straight within my posts because the other site may move (and, of course, object to this type of inclusion). So even when I’m not changing a picture (which I do most of the time) I’m putting a copy on my own server.

3. Not hosting elsewhere

I’ve moved so often from free sites before I arrived at this, my own (non-free) server, that I can only suggest for you to put your data – images, articles, and so on – nowhere else but on a space you own. On a space that can’t suddenly change the ToS, add pop-ups or other ads, or shut down. For this reason, even when I was blogging on Blogger, I’ve let them FTP-transmit their data to my server so I have a domain ( that’s mine. For this reason, I’m somewhat reluctant to use services like Flickr.

4. Clean code

I’m using as little JavaScript as possible and as much standard-compliant (XHTML + CSS) web code as possible. I want to ensure that in the years to come, I have a good chance browsers will understand the HTML here, or if not, that I’m able to easily convert it.

5. Choosing a top level domain, and sticking with it

Actually, I did this one wrong: I’ve chose a sub-domain on my own server instead of going for a domain name easy to recognize. On the other hand, I don’t want to move as I want the archive permalinks to be indeed permanent. If you’ve still got the choice I suggest you name your domain like you name your blog, and then, never move – your PageRank will increase over time, and your archive will be stable.

6. Explain more than necessary

I’m sometimes explaining obvious words (like “SEO”) simply so that people, years from now, coming from a search engine that doesn’t even exist yet, and stumbling upon one of the articles here, have a chance to understand what the article is all about. (And it also helps those today who may come here from sources such as Google News – in other words, readers who potentially are new to this blog and the search engine world in general.)

7. Prominently writing the date on the page

This one is quite obvious for bloggers: you must put the full date on the page whenever you publish something. But have you noticed that in some mainstream sources, there’s no year given in a date? This creates a terrible archive because in some years, you won’t be able to tell if you stumbled on a “December 8” article from 1999, 2000, 2001, ... and so on.

8. Separating content from layout

As far as it’s possible (and it’s not completely possible), I try to separate content from layout on this blog so that when I later have a redesign, my archive won’t pose a problem. For example, I have an element with a class “more” which is at this moment displayed with a red arrow. When I decide this doesn’t fit the blog layout anymore during a redesign, I can simply think of another visual to represent “more information to be found here.” (If I would have included a “red-arrow.gif”, redesigning the site would be a headache.) And also, if you think cross-media by allowing for different layouts for different needs, you may even be able to turn your blog archive into a book...

9. Putting the right amount on a page

I don’t want to create yet another page for every little link I post. That’s why I use a multi-post-per-page style for smaller posts. For longer posts, I’m using a separate page. I feel this keeps my archive from “exploding.” (The down-side is that when you arrive on a multi-post-per-page style page from a search engine, you must scroll to find the post you were looking for.)

10. Using your own blog software

As I’m using my own blog software, I have complete control over all past posts. I can create my own archive view, search and replace in older posts when technical changes need to be made. I can also export & back-up to different formats so my archive is save. My suggestion would be whenever possible, install blogging software you have complete control over – i.e., software you wrote yourself, or an open source blogging package running on your own server.

11. Referencing the surrounding

If you want your old posts to make sense you need to be careful about talking about the surroundings of the post. For example, a sentence like “click on the ’Archive’ link below to get to the archive” would be rendered meaningless if you change your template and move the “Archive” link from the bottom to the left, and rename it to “Old posts.”

12. Avoiding too many CSS classes

In the beginning, I’ve used many different CSS classes for many different purposes. After a while, I redesigned the site – and the CSS file felt cluttered and hard to work with. I’m now keeping to a minimum of complexity within the CSS. More creativity can be put into illustrations accompanying the post. But be careful; unless your images are all square, you are stuck with whatever background color you chose for your template (unless you want to redesign all images from your archive whenever you redesign the site). That’s one reason I think white makes a good background color: it’s very neutral, and you’re not that likely to find it annoying. As soon as real PNG anti-alias (smooth transparency effects) work cross-browser, images might be prepared so they work on any background.

13. Avoid too-clever headlines

Easily understood headlines are very important for a successful archive overview page, or search engine result pages, where the visitor has little more than the headline to decide if that’s the wanted content. Puns take some seconds longer to be understood, or can’t be understood at all without further context of the post, and should thus be avoided within headlines for blog posts. On a side-note, headlines without context (microcontent) is also why I repeat the word “Google” in headlines whenever I post about Google.... because reading it out of context, you may not know my blog is about Google.

14. Vary keywords

If your blog has a lot of posts, you find searching it is important to reference older posts, put new posts in context, or find out if you already blogged about something years ago. An easily searchable archive of course helps your visitors too. But how do you guarantee your blog to be easily searchable? When I write posts, I often try to vary keywords I’m using. For example, when I link to a tool called “My Google Portal” at, I might use both “My Google Portal” in the headline, and “” in the main copy. This way, when I later search for either variant, I’m able to find the post again. (Naturally, I think the variation is also good style because it doesn’t make readers feel like the introductory sentence to a post is a mere repetition of the headline they already read.)

15. Nice URLs

URLs are visible to the visitor, so having “nice” URLs is not a mere technical issue. Especially with a large archive, a clean URL structure – including the date of the post – improves a blog. For example, “” would be a nice URL (note keywords in URLs don’t matter, really).

16. Discussing links

Similar to point number 1 – making screenshots of sites I link to – I want to give a nut-shell of the link as well (“the page contains a video about ...”). This is because the site linked to may stop existing, and then it would be hard to understand what it was about, and a worthwhile idea may be lost. Why not make your blog archive into a digital museum for future readers?

17. Providing context

On the web, there’s no easier way to provide context for those who need it than using links – because it’s unobtrusive for those who don’t need this context (they may simply not click on it and ignore the link). That’s why it’s good style to always link to relevant previous posts if it’s an ongoing discussion (e.g. when you write “as mentioned yesterday” or “last month’s contest”). This way, visitors can read your archived posts months or years later and still know what was going on.

18. Link to the archive

To show newer readers what you’ve posted about in the past, it’s valuable to provide summarized link collections (“all my humor links from the past 2 years...”), to link to the archive in one of the most prominent places on your blog, to make the archive easily searchable, to link to related items from the past from within newer posts, and possibly, to provide “items this time last year” or “best of" style link collections. I often find that for a new readers of this blog, an old link or Google tidbit can be just as interesting as a fresh one.

19. Categories

I’m not a big fan of categories myself – I prefer the flat “infinite tags” approach of full-text search a la Google – but it should be pointed out here that many bloggers attach specific keywords to every post for later organization and finding. These categories may then be linked to from the archive or the navigation bar next to every post.

20. Help others do something with your archive

While for all of the above approaches it was mainly you responsible to improve the archive and render it meaningful, there are also strategies to let others help out. For example, by providing an easy download to your whole archive in a structured data format, you are giving creative developers a chance to analyze and visualize your data. And by publishing your content under a Creative Commons license (like Boing Boing and Micro Persuasion are doing it), others get the chance to do something interesting with your data by remixing it into something new.


