Wednesday, June 2, 2004


MoreGoogle is a new Google browser add-on. After installing it your results are “hi-jacked” by this independent application, displaying thumbnails next to the snippet. This is an interesting concept which somewhat reminds me of my FindForward, though the hi-jacking part leaves a bad after-taste – I kind of get the feeling Google will sue. [Thanks to ResearchBuzz.]

Corporate Search Appliance

Google announces availability of next-generation corporate search appliance (June 2, 2004).

The Great Meta-Lie

Sad but true, meta-data (data about data) still doesn’t work, and we have reason to doubt it ever will. The perfect Web might evolve out of a highly imperfect one.

Search engines already value factors outside the page to be ranked much, much higher. This is to prevent easy result spamming. Search engines never believe what you have to say about your page; they only trust the mass of other pages. However people still create different forms of meta-data, some of them helpful, some of them not.

Titles and Alt-Text

Search engines one day will know how to use optical character recognition for text-as-images. This won’t make image alt-text obsolete, but one major reason people use alt-text (aside from us hardcore HTML evangelists) is because they heard it helps with Search Engine Optimization.

Titles are also part of meta-data, but they are very important in search results, for bookmarking, and other uses. But titles are also visible in a browser, usually in the browser title bar. They do not clearly belong in the realms of invisible background meta data.


Then there are the meta-keywords and meta-description declaration, which are more or less of no use at all. Spider Inktomi might give them some limited weight, whereas Google ignores them completely for the ranking, and only sometimes display the meta-description in the result pages. Meta-keywords have also be extended to feature a whole lot of pre-defined elements, the Dublin Core Metadata Initiative.

Meta-keywords were also thought to be helpful to attach misspellings or related concepts to a web site. This would mean I can write about beer, but mention “bubbling alcohol” in the meta section. Search engines like Google are already implementing synonyms and word stemming. They also look at the link text pointing to a page, so that when you link to the beer page with the link “bubbling alcohol” there is no need for the page’s author to add this phrase anywere. And he might also not even have thought of it.

Phrase Markup

HTML has a great set of phrase markup (inline meta-data, if you will) which is also completely unused and irrelevant if you take the Web as whole. (The set of phrase markup unfortunately is also restricted and heavily tech-based, enabling a Web author to say “this is code”, “this is keyboard input”, “this is a variable”, whereas you can’t say “this is a footnote”.)


And then, there is the meta-data of the blogosphere; currently, various RSS formats as well as the new Atom format. The formats are intended to be simple but become complex taken. XML features the concept of name-spaces to mix different XML applications together within a single document. This is nice enough but not very easy. HTML became popular not only because it was simple, but because the source could be read by humans. Tim Berners-Lee never anticipated this to happen. (HTML also became popular because you can directly see the results; meta-data is mostly invisible, certainly with today’s tools.)

Time will tell if RSS still exists ten years from now. It is a great tool today, but as soon as we want every web site to be a blog, and as soon as we want every news site included in our blogreader, we may not see RSS anymore. It might be another technical detail in the engine running in the background. People don’t really care how information reaches them.


We can look further away from your average web site and we find XML files officially connecting friends (the friend-of-a-friend files), and a multitude of other formats, like geo-positioning. (I try those out on my own server but don’t expect much to happen.) We can also see RDF, the Resource Description Framework pushed by the World Wide Web Consortium (W3C) and Tim Berners-Lee. But in the end we are talking about text files communicating ideas about ideas – now matter how smart or geek-pleasing the format (yes, I love XML), somebody has to write those files.

Meta-data like RDF means building a second world to mirror the first. That sounds like a whole lot of work.

Lego Bricks for Lazy People

If we want people to build a great Web, we must enable content creators to connect small and simple pieces, equivalent of Lego bricks. You might remember when you were playing as a kid, sometimes a great-looking piece just didn’t connect to the other pieces; so you built your house out of those plain-looking individual pieces. And consequently what takes off on the Web are the Lego bricks like RSS; highly simple information pieces carrying no concept on their own.

Cory Doctorow in 2001 identified several problems with meta-data on the Web. Most importantly, people lie, people are lazy, and people are stupid.

Are people that lazy? Not quite; people are too lazy to care for meta-data, because it is not the most pragmatic communication. Yet people invest a lot of energy to chat, blog, discuss and email. This sort of chatter is just as important as a well-organized knowledge database, only infinitely harder to parse. It is the gold mine for search engine developers. A gold mine for developers of any kind of tool attempting to make sense of the future world. Search engines will head the way of intelligent answer machines – one day we might just think of them as an extremely knowledgeable, never-sleeping friend. A friend who treats “meta” as lie.

Top-Ranked Nigritude Ultramarine

[Nigritude Ultramarine]

I have been exchanging mails with the creator of current number one entry for SEO competition Nigritude Ultramarine. He is relying on the same non-SEO tactic I am: letting people help out with links containing the phrase Nigritude Ultramarine – and he set up a forum where he, just like me, will share the prizes.

Interestingly enough his entry is also about the only one which contains some real content, next to the nice Nigritude Ultramarine FAQ (which reveals “Unofficially, at least one Google employee is participating. Wesley Chan, a Google product manager and staff photographer”).
The current number one has also started fighting back by following the Wiki Sandbox strategy I posted. It’s a gentleman fight, so good luck to our competitors...


