Google Blogoscoped

Forum

How Google Finds Out About Some Deep Websites  (View post)

Jordan Christensen [PersonRank 0]

Saturday, February 24, 2007
12 years ago4,281 views

Or you could just block the bot. Does everyone forget how simple a robots.txt is to setup?

Veky [PersonRank 10]

12 years ago #

It's well known. It's even in the GWFAQ:
google.com/support/webmasters/ ...

Philipp Lenssen [PersonRank 10]

12 years ago #

> Or you could just block the bot. Does everyone forget
> how simple a robots.txt is to setup?

That's not enough. Google may still link to your site from their search results. They will simply not crawl your server, but as they have the URL, they link to it from the search results.

Brian Mingus [PersonRank 10]

12 years ago #

In my case, the information is not secret in the information security classification sense. I just don't want search engines to know about it, because once its out there you have to deal with everyone unleashing their poorly designed spiders.

I am now adding a message asking everyone who uses the wiki to install Firefox and use the RefControl extension. addons.mozilla.org/firefox/953 ...

Authentication is overkill. The real problem is that we were in the habit of voluntarily releasing sensitive information about our surfing habits.

Matt Cutts [PersonRank 10]

12 years ago #

Veky, thanks for beating me to post that link; it specifically mentions referrer logs as one way that "secret" servers can be found.

Brian Mingus, in my experience if you ask 100 people to install a plugin, not all of them will. If you really want to prevent referrer leaks, you might want to rewrite your outbound links to go through an internal redirect that would strip the referrer.

Sketchee [PersonRank 1]

12 years ago #

Also use the noindex metatag so that its not including in their index...

Nick [PersonRank 0]

12 years ago #

Philipp (or anyone else), how does that work when Google still provides links to a site even when it uses robots.txt? Can a Googlebot still search through my pages by keywords and serve them up? What does come up in Google's search results when Google finds my "hidden" pages this way?

Philipp Lenssen [PersonRank 10]

12 years ago #

> Philipp (or anyone else), how does that work when
> Google still provides links to a site even when it uses robots.txt?

Nick, it may be counter-intuitive, but robots.txt just tells the bot to not crawl the site. It doesn't disallow the search engine from linking to your site (at least that's how it's traditionally interpreted by many search engines). What Google does is not only link to your site (they can find the URL on other pages, of course), but also (as of lately) show a somewhat meaningful title based on link text found elsewhere.

This thread is locked as it's old... but you can create a new thread in the forum. 

Forum home

Advertisement

 
Blog  |  Forum     more >> Archive | Feed | Google's blogs | About
Advertisement

 

This site unofficially covers Google™ and more with some rights reserved. Join our forum!