Below is the machine translation. -------------------------------------------- How can I prevent images from appearing in Google search results?
When a user clicks on a result in a search of images with Google, it is redirected to a search results page consists of two frames. The senior contains a thumbnail of the image in question and the lower frame displays the original web page (page on which the image is displayed). If you do not want your site listed as a home page (to restrict traffic for example), you can add a meta tag noimageindex to the header of your web page. Examples:
<meta Name="robots" content="noimageindex">
-- Or --
<meta Name="googlebot" content="noimageindex">
Note that the images of this page will still appear in the index of images if other pages provide links to them. |
Here is the same page in English:
http://www.google.com/support/webmasters/bin/answer.py?hl=en-uk&answer=79892
It doesn't seem to be new but it seems to be a very obscure value. Only below 1,000 results in Google for "noimageindex" and not a single mention on this blog :) |
[Slightly off-topic] I don't understand why Google sometimes uses "UK" in language codes to mean "United Kingdom" when the ISO standard is "GB". "UK" actually means "Ukrainian":
http://www.iana.org/assignments/language-subtag-registry |
Another way of preventing Google from indexing certain images is to add the following line to the HTTP header while delivering the image itself:
X-Robots-Tag: noindex
Dan Crow explains this here:
http://googleblog.blogspot.com/2007/07/robots-exclusion-protocol-now-with-even.html
We at http://www.zeno.org are using this technique to prevent Google from indexing CERTAIN images (especially nudity in our gallery of paintings) so that we won't get filtered in the default SafeSearch moderate filtering.
Needless to say that I hate this self-censorship, but I see no alternatives. Anyone else?
|
You could also use User-agent: * Disallow: /*.gif
I think both Yahoo and Google support wildcards in this way in a robots.txt |
As long I remember, "noimageindex" is (was?) used by the search engine Altavista. I have seen it one or two times, no more. |
UMMM... has this page these pages been hacked? Don't think the red was there before |
It is my undersanding that Matt's suggestion would suggest to all robots not to download or read the "gif" files. This would save you the expense of sending those files to the robots, but it would not keep them out of the index or filter ratings. The robots could still learn of them from pages that reference them. Google probably relies on such pages for index terms and for SafeSearch ratings.
The pages that reference them might not be in your control as the Google help file for "noimageindex" warns.
"Note that the images on the page may still be included in the image index if they are linked to by other pages." |
George R, Using Disallow: /*.gif would make all robots ignore all gif images on your site. So those gif images would not be indexed because of that line. This Disallow could catch other things besides gif images because you aren't specifying that .gif is the end of the string you want to block.
You could specify Disallow: /*.gif$ which would tell the googlebot to disallow any files that end in .gif . I'm not sure if other robots recognize the $ end of line marker though. |
"...image search engines can send a whole lot of traffic to your site..." So true!
My most popular post – which is definitely not my best one – includes a picture of Arnold Schwarzenegger. The only reason it was so popular is that for a while it ranked #1 for Google image searches for 'Terminator 2.' |
I think that the robots.txt rules can prohibit crawling. They do not place any restrictions on indexing. To cache a file a robot may need to read it (i.e crawl it). The html meta tags for robots may restrict indexing via that html file.
Google's web search can index files without crawling them.
Google's image search results show images of its matches. If google image search only indexes what it caches, only caches what it crawls, and honors robots.txt, then it would seem that it would only index what it crawls.
Google's comment in the help file suggests with respect to the meta tag that google may not be this restrictive. Presumably this is about images in the web index.
"Note that the images on the page may still be included in the image index if they are linked to by other pages."
I do not know whether the web page crawl and image crawl are the same crawl. I do not know if the web page index and image index are integrated. Honoring robots.txt is voluntary and Google may have chosen to act more restrictive or less restrictive than indicated. |
> I do not know whether the web page crawl and > image crawl are the same crawl. I do not know > if the web page index and image index are > integrated. Honoring robots.txt is voluntary > and Google may have chosen to act more restrictive > or less restrictive than indicated.
George, as we saw in the past Google does indeed display/ index more than it crawls in web search, e.g. it may show page X even though X is disabled via robots.txt, simply because X was linked from elsewhere. (Google may also display a title, based on what the backlinks text reads.) However, robots.txt for *images* should be safe for now simply because Google always displays a thumbnail when linking to an image... and I can't see how they could generate that thumbnail at this time unless they access the source image, which they can't* due to the robots.txt.
*Well, technically: can, but don't *want* to access due to their motivation to honor robots.txt when crawling. |
I agree with Philipp about the traffic benefits gained by allowing your images to be added to Google, & Yahoo Image search etc
For several years I blocked access to my image directory using robots.txt rules to save bandwidth costs
In mid-late 2007 I removed those rules and now in early 2008 many of my images have been indexed and thanks to my detailed ALT tags i get about 3% of site traffic from Image search :-)
PS Bandwidth usage did not change appreciably after my changes |