Google Blogoscoped

Forum

Open Source, Distributed Search Engine? Anyone?

Ludwik Trammer [PersonRank 10]

Saturday, May 12, 2007
17 years ago2,719 views

"Google is like a young mammoth, already very strong but still growing. Healthy quarter results and rising expectations in the online advertising space are the biggest factors for Google to keep its pace in NASDAQ. But now let's think outside the square and try to figure out a Google killer scenario. You may know that I am obsessed with open source (e.g. my projects openhuman and simplekde), so my proposition will be open source based – and I'll call it Google[put at-character here]Home."

More: http://www.readwriteweb.com/archives/how_to_build_an_open_source_google.php

carl mackey cjmackey@stwing.org [PersonRank 0]

17 years ago #

found something (with an awesome name):
http://www.majestic12.co.uk/
the crawling is distributed; everything else seems not to be.
ideally i'd like a totally distributed one, where nodes are also storing index data and doing stuff like pagerank and handling search requests, but whatever. doing something like seti or folding or what this majestic12 is doing keeps stuff centralized, unfortunately.

cjderum [PersonRank 0]

17 years ago #

Interesting topic. Yes, I agree about Majestic-12, it's very much the opposite of where this needs to go. The last thing we need is yet another centralized search engine. Even if the crawling is distributed, the central engine still doesn't scale.

From – http://slashdot.org/comments.pl?sid=235033&cid=19158565

"Although everyone loves Google at the present time, it's still always puzzled me that people aren't working on a distributed search mechanism that could potentially be far more capable and powerful than Google.

After all, individual sites are far better placed to index their resources than a generic crawler can ever be, for a number of reasons. They have far more efficient access to their local data for starters, and are able to do the indexing instantaneously as things change. Individual sites are also able to apply semantic information since they know what their sites are actually about, whereas a generic engine cannot possibly know.

The sheer power available in a distributed search system would also be massively beyond anything that even the mighty Google could ever supply, for all the usual reasons associated with distribution and distributed computation.

Once you recurse more than a few levels down a parallel distributed search tree, the available processing power and bandwidth just go totally astronomic. What's more, simply limiting the degree of query recursion would allow you to tailor your desired results/time behaviour, and since the intelligent tagging at each site would contain hugely more semantic information than currently, you could direct your searches far more effectively too.

And it wouldn't be slower ether, because the distributed indexes are easily gathered by caching aggregators, and competition would no doubt provide plenty of those.

I know that several distributed search efforts do exist, but the point here is that they have virtually zero takeup, largely because of the dominance of Google and the general state of happiness with centralized search technology. While centralization works more or less OK for now, distribution has the potential to provide a vastly superior search system in ALL respects.

We really should be looking at it more seriously."

Gosia Garbe [PersonRank 0]

17 years ago #

Have a look at FAROO (http://www.faroo.com), a peer-to-peer web search engine. Crawling, indexing, ranking and search are distributed.

Forum home

Advertisement

 
Blog  |  Forum     more >> Archive | Feed | Google's blogs | About
Advertisement

 

This site unofficially covers Google™ and more with some rights reserved. Join our forum!