Google Blogoscoped

Wednesday, May 21, 2003

If you want to know what people enter in search engines, view the last 10 queries performed on AllTheWeb.


I wrote a little Memomarker tool. You enter a URL, and it will generate a Google query using the longest words of the page, thus trying to create a memomark. (The current version still snoops JavaScript and CSS of a page, which ruins it for some sites.)

Thinking about different algorithms to implement automated memomarking, I found three possible approaches, of which I chose the last one:

  1. Querying the rarest words (needs dictionary data + data of which words are rare occurrences).
  2. Querying with words that are not in a dictionary (needs dictionary data + doesn’t work good with spelling errors, since they are likely to be corrected in future revisions of a text).
  3. Using the longest words of a text (no additional data needed, easily implemented — based on the approximation that longer words are more rare).

Memomarking Works

Ideaspot: instead of a regular bookmark, suffering from link rot, try memomarks or memowords. A memomark is a unique string taken from a webpage, author Microdoc’s idea being that you query Google for a phrase instead of linking directly to a brittle URL (similar to what I described yesterday).
Here I wanted to link to Tim Berner-Lee’s still relevant “Cool URIs don’t Change”, but the address returned by Google — — cannot be found at the moment. (Thanks to Google, I could still cachemark it).
Unfortunately, URLs do change. All the time. Repeatedly so.

Just as buildings in the Real World must have flexibility to not break during storm, web architects constructing virtual bridges tackle a similar challenge. While you can’t fight a natural phenomenon, you can acknowledge it, and adapt the space you control. And just as the idea of meta data is too idealistic, web writers, readers & developers are discovering more pragmatic ways to slash a path through the Web jungle. (Note: this last phrase can be memomarked).

Cool URIs don’t change? In the future — even though it’d be the death of PageRank — we might say: cool people don’t rely on URIs.

Cyveillance Bot, Agent of the Defenders of Copyright

“I read my web server access logs along with the morning papers. The logs record visits to my website; lately there’s been a creepy one. (...)

The bot in question is like a stranger who comes to your door, and, by way of introduction, lies to you. Indeed, it’s a kind of hyperkinetic liar: it forges the names of different versions of Microsoft’s Internet Explorer (...)

If Googlebot resembles a well-known delivery person who comes in a uniform, shows ID and leaves a card, this bot is like an unshaven, twitchy guy with hat pulled down, lurking by your door. My visitor is known to work for record companies, film studios and big corporations. Its job is to find out what people think of them, and to check that no intellectual property ¬≠MP3 files, movies, trademarks happens to be on my server’s hard drive.”
– Chris Gulker: ’There’s been a creepy visitor to my website. It’s a bot - short for robot - and it’s a serial liar’, 21 May 2003

Daypop’s Chan Interviewed

Daypop is a blog and news search. One thing is remarkable: it’s run and managed by a single individual, Dan Chan. Gary Price interviews Dan for SearchEngineWatch in Behind the Scenes at the Daypop Search Engine (May 21, 2003). Because of the one-man show approach, the author argues, Dan can try new things and experiment, possibly tweaking the engine in matters minutes.


Blog  |  Forum     more >> Archive | Feed | Google's blogs | About


This site unofficially covers Google™ and more with some rights reserved. Join our forum!