How Google Finds Out About Some Deep Websites - Google Blogoscoped Forum

Forum

How Google Finds Out About Some Deep Websites (View post)
Jordan Christensen	Saturday, February 24, 2007 18 years ago • 5,934 views
Or you could just block the bot. Does everyone forget how simple a robots.txt is to setup?
Veky	18 years ago #
It's well known. It's even in the GWFAQ: http://www.google.com/support/webmasters/bin/answer.py?answer=33574
Philipp Lenssen	18 years ago #
> Or you could just block the bot. Does everyone forget > how simple a robots.txt is to setup? That's not enough. Google may still link to your site from their search results. They will simply not crawl your server, but as they have the URL, they link to it from the search results.
Brian M.	18 years ago #
In my case, the information is not secret in the information security classification sense. I just don't want search engines to know about it, because once its out there you have to deal with everyone unleashing their poorly designed spiders. I am now adding a message asking everyone who uses the wiki to install Firefox and use the RefControl extension. https://addons.mozilla.org/firefox/953/ Authentication is overkill. The real problem is that we were in the habit of voluntarily releasing sensitive information about our surfing habits.
Matt Cutts	18 years ago #
Veky, thanks for beating me to post that link; it specifically mentions referrer logs as one way that "secret" servers can be found. Brian Mingus, in my experience if you ask 100 people to install a plugin, not all of them will. If you really want to prevent referrer leaks, you might want to rewrite your outbound links to go through an internal redirect that would strip the referrer.
Sketchee	18 years ago #
Also use the noindex metatag so that its not including in their index...
Nick	18 years ago #
Philipp (or anyone else), how does that work when Google still provides links to a site even when it uses robots.txt? Can a Googlebot still search through my pages by keywords and serve them up? What does come up in Google's search results when Google finds my "hidden" pages this way?
Philipp Lenssen	18 years ago #
> Philipp (or anyone else), how does that work when > Google still provides links to a site even when it uses robots.txt? Nick, it may be counter-intuitive, but robots.txt just tells the bot to not crawl the site. It doesn't disallow the search engine from linking to your site (at least that's how it's traditionally interpreted by many search engines). What Google does is not only link to your site (they can find the URL on other pages, of course), but also (as of lately) show a somewhat meaningful title based on link text found elsewhere.

Advertisement

Blog | Forum more >> Archive | Feed | Google's blogs | About

Advertisement

This site unofficially covers Google™ and more with some rights reserved. Join our forum!