Google Blogoscoped

Forum

How does the Google bot actually work in scanning directories?

David T [PersonRank 7]

Thursday, October 11, 2007
16 years ago4,002 views

Hi there Blogoscoped readers.

I have a problem, and thought that Philipp may know, or someone else who understands how the Google spiders work...

I have a hosting package whereby I can have unlimited domains hosted on 1 hosting account. This means, I have the following setup on my hosting space:

www.site1.com
www.site1.com/site2.com ==> site2.com autmatically loads up site2.com and hides the fact it is on site1.com
www.site1.com/site3.com ==> site3.com autmatically loads up site3.com and hides the fact it is on site1.com

The problem I have is that I am worried that Google may index my content for site2 and site3 twice, i.e:

www.site1.com/site2.com/page1.html
www.site2.com/page1.html

Google dislikes duplicate content and I don't want it either!

My question is how I can deal with this. Do you think inserting a line in robots.txt on site1.com disallow: /site2.com/ would work to disallow Google referencing the pages on www.site1.com/site2.com or would this line also disallow Google referencing www.site2.com? Ofcourse on www.site2.com I could tell Google in my robots.txt to go ahead indexing the whole site.

It seems this is more a question about how the Google bot works! I asked my host which is one of the biggest in the US (hostmonster) and they said "we do not support googles functions or bots, we have this query all the time, the truth is we can not support it as google may or may not allow it, you need to go off of their rules and regulations". So basically that's a "err, I dunno" from them!

Any ideas? Thanks so much,

David

Philipp Lenssen [PersonRank 10]

16 years ago #

Do you have a permanent redirect HTTP header set up to point from domain 1 & 3 to domain 2 or...?

David T [PersonRank 7]

16 years ago #

Hi Philipp, thanks for getting back about this mini-nightmare, basically the hosting package creates an .htaccess to redirect the extra sites which are folders in the main site. Does that make sense?

Zim [PersonRank 10]

16 years ago #

David, if you put in your robots.txt the order to disallow the subfolder site2.com, then anydomainyou.have/site2.com won't be read.
I think it'll work if page1.html is visible from site2.com/page1.html
I suggest you to investigate about parked domains vs addon domains, what you can do with them, robots.txt instructions, and finally, the almighty .htaccess.
Good luck!

David T [PersonRank 7]

16 years ago #

Thanks for the advice Zim!

Forum home

Advertisement

 
Blog  |  Forum     more >> Archive | Feed | Google's blogs | About
Advertisement

 

This site unofficially covers Google™ and more with some rights reserved. Join our forum!