Hi there Blogoscoped readers.
I have a problem, and thought that Philipp may know, or someone else who understands how the Google spiders work...
I have a hosting package whereby I can have unlimited domains hosted on 1 hosting account. This means, I have the following setup on my hosting space:
www.site1.com www.site1.com/site2.com ==> site2.com autmatically loads up site2.com and hides the fact it is on site1.com www.site1.com/site3.com ==> site3.com autmatically loads up site3.com and hides the fact it is on site1.com
The problem I have is that I am worried that Google may index my content for site2 and site3 twice, i.e:
www.site1.com/site2.com/page1.html www.site2.com/page1.html
Google dislikes duplicate content and I don't want it either!
My question is how I can deal with this. Do you think inserting a line in robots.txt on site1.com disallow: /site2.com/ would work to disallow Google referencing the pages on www.site1.com/site2.com or would this line also disallow Google referencing www.site2.com? Ofcourse on www.site2.com I could tell Google in my robots.txt to go ahead indexing the whole site.
It seems this is more a question about how the Google bot works! I asked my host which is one of the biggest in the US (hostmonster) and they said "we do not support googles functions or bots, we have this query all the time, the truth is we can not support it as google may or may not allow it, you need to go off of their rules and regulations". So basically that's a "err, I dunno" from them!
Any ideas? Thanks so much,
David |
Do you have a permanent redirect HTTP header set up to point from domain 1 & 3 to domain 2 or...? |
Hi Philipp, thanks for getting back about this mini-nightmare, basically the hosting package creates an .htaccess to redirect the extra sites which are folders in the main site. Does that make sense? |
David, if you put in your robots.txt the order to disallow the subfolder site2.com, then anydomainyou.have/site2.com won't be read. I think it'll work if page1.html is visible from site2.com/page1.html I suggest you to investigate about parked domains vs addon domains, what you can do with them, robots.txt instructions, and finally, the almighty .htaccess. Good luck! |
Thanks for the advice Zim! |