Stanford BackRub Web Project

BackRub is a "web crawler" which is designed to traverse the web.

Currently we are developing techniques to improve web search engines. We will make various services available as soon as possible.

Sorry, many services are unavailable due to a local network faliure beyond our control. We are working to fix the problem and hope to be back up soon. 12/4/97

We have a demo that searches the titles of over 16 million urls: BackRub title search demo

BackRub search with comparison (type in top box, ignore cgi-bin error) New systems will be coming soon.
Some documentation from a talk about the system is here.

BackRub is a research project of the Digital Library Project in the Computer Science Department at Stanford University.

Some Rough Statistics (from August 29th, 1996)
Total indexable HTML urls: 75.2306 Million
Total content downloaded: 207.022 gigabytes
Total indexable HTML pages downloaded: 30.6255 Million
Total indexable HTML pages which have not been attempted yet: 30.6822 Million
Total robots.txt excluded: 0.224249 Million
Total socket or connection errors: 1.31841 Million

BackRub is written in Java and Python and runs on several Sun Ultras and Intel Pentiums running Linux. The primary database is kept on an Sun Ultra II with 28GB of disk. Scott Hassan and Alan Steremberg have provided a great deal of very talented implementation help. Sergey Brin has also been very involved and deserves many thanks.

Before emailing, please read the FAQ. Thanks.

-Larry Page page@cs.stanford.edu