Thursday, May 22, 2003

Google and the Evolution of the Democratic Web

Google’s PageRank algorithm, as you may know, counts links from other pages to a page, and calculates the page’s importance. This approach is a major factor in determining the ranking within a Google search result. And the ranking, quite obviously, determines wether or not people follow to the page – wether or not the page will be noticed.

“PageRank relies on the uniquely democratic nature of the web by using its vast link structure as an indicator of an individual page’s value.”
Google Technology: PageRank Explained

Is this sort of web democracy fair? This system might favor the common and accepted. It will also favor those opinionated people who are decisive, get active, and vote. As a matter of fact, it will even favor those people that are voted themselves:

“Google looks at more than the sheer volume of votes, or links a page receives; it also analyzes the page that casts the vote. Votes cast by pages that are themselves ’important’ weigh more heavily and help to make other pages ’important.’”
Google Technology: PageRank Explained

“And as for the “uniquely democratic” nature of PageRank, that’s an insult. Democracies were designed to give one vote per person, as a counterweight to the powerful. PageRank is designed to give extra votes to sites that are already powerful.”
– Kackle, WebMasterWorld Forum, March 6 2003

What happens to ideas of outsiders? Ideas people in general won’t consider worthwhile?

An important factor here is artificial boosting through a sort of self-validating Google feedback mechanism.

Once a page scores a good ranking, people might link to it even more and in return giving it even better PageRank, until the page might solidly manifest itself as the first resource on a given subject. Well:

“Google combines PageRank with sophisticated text-matching techniques to find pages that are both important and relevant to your search”
Google Technology: PageRank Explained

Sure it does not rely on PageRank alone. But let’s say I’m researching an article on Albert Einstein, and want to include a link to his biography. I enter the two keywords “Einstein” and “Biography”, and get 125,000 pages. I might go through the first ten or twenty pages (0.016%) of the Google result list, and decide which article seems best. Any article on a less prominent ranking position is discarded, because I don’t have the time, patience and memory to compare the quality of all the results. I might just go and link to the first result if a quick glance reveals it’s OK.

An Einstein biography is not even a very critical opinion piece. I might want to research a political subject. A scientific theory. How can Google algorithmically determine quality? Nobody could.

What does all this mean for web democracy? It means that new-comers, no matter how good their content is, have a tough time to shake the fundaments of older pages. Because people already link to those, making it hard to shift into the higher ranks of the elite. Google, naturally, has no mechanism to notice when a genius appears on the scene.

What could be a way out of this feedback misery, if this old boys network indeed exists?

One approach would be to slightly favor new documents, to balance a possible “I was first” PageRank advantage.

Second, and equally important: add randomness. Just think of gene mutations. Only evolution shows which concept survives, but evolution is all about a certain degree of randomness.
Give a certain page a higher ranking for a period of time and see if people will link more or less to it than to the old page — or in the Darwinian sense, wether or not the new species creates more or less offspring.

Oh yeah, and here’s the Einstein Biography. Let’s trust Google it’s good.

The Rise and Fall of the Google Empire

Be warned; this is a (strictly hypothetic) Google fan’s nightmare.

Google 2020

1996: Brin and Page create BackRub, starting the era of modern search.
1998: $1 million is raised in fundings to start Google.
1999: Google has 39 employees watching the 3 million daily searches.
2001: Google now offers several country domains.
2002: Google introduces the Google Web API, flirting with developers.
2003: Google Answers lifts searching up to a human level.
2004: Google removes blogs from web search.
2004: Googlewhacking gets low coverage during the Athens 2004 Olympic Games.
2005: Google adds a costly “speed-submit URL” option.
2006: The Google cache is ruled illegal by US law because it violates the federal Digital Millennium Copyright Act. Google has to remove it.
2007: A US court decides that Google Images violates copyright. Google removes it.
2008: AllTheWeb is getting better and allows complex searches.
2009: The GoogleBot (ExtraFreshBot) is now fully indexing most webpages (and file downloads) every minute, causing heavy server traffic — people start to exclude Google via robots.txt.
2010: Google introduces Pop-Up advertisement.
2011: Google doesn’t catch up with new search technologies.
2012: Brin and Page officially resign this year.
2013: Google now features graphic ads on top of every search.
2014: AllTheWeb buys DayPop.
2014: Google, using its Geolocation feature, starts to heavily censor content for certain countries. Entering “Hitler” at returns zero results.
2015: Google buys the Yahoo! Directory and removes the DMOZ Open Directory Project.
2016: Google is successfully sued by Microsoft for spidering Windows Servers. Also, Internet Explorer 9 won’t allow accessing anything but MSN search.
2017: Fire in the Googleplex destroys most of the hardware and research papers. Luckily, no person is harmed.
2017: A Stanford linguistics Professor implements Google semantic search, the results of which are seemingly random.
2017: Google’s main sponsor is now the NSA, who change their name to “National Search Agency”.
2018: Google goes public and enters the busy world of stock market speculations.
2020: AltaVista buys Google Inc.
2020: GoogleVista introduces a pay-per-search scheme.
2021: turns into a Flash-based shopping portal.
2022: GoogleVista sells user information collected over the years to third parties. New York Times reports.
2023: People protest: GoogleVista is shut down due to online privacy issues.
2024: Brin and Page create a new search engine, Dooogle, which does nothing but search for Dilbert cartoons online using their newly patented DilbertRank. Amusing as it may be, Dooogle flops.
2038: The Web is infested by spammers, wild speculation, fake news, conspiracy theories, urban legends, retouched photos, copyright-breaking material, nudity, and online pranks. People around the globe give up online search and go back to books, newspapers, magazines, and plain old face-to-face communication.
2051: Google is officially a footnote in Internet history books. The elders who knew it refer to the early 21st century as the “golden Google age, when information was valid”. The younger generation doesn’t understand.
3064: An alien race discovers the Googleplex ruins, turning it into their new church, worshipping lava lamps.
3088: A giant meteor is approaching earth at deadly speed. The last blog,, covers the story. The site stays unread, resulting in the termination of the human race and most animals. The cockroaches survive.

Book Google Hacks a Bestseller

Google Hacks, by Tara Calishain and Rael Dornfest, cracked the ranks of the New York Times top ten business paperback sellers for May.”
– Chris Sherman, Why Google Hacks is a Bestseller (, May 22, 2003


