Google Blogoscoped

Forum

No Index for images?  (View post)

beussery [PersonRank 10]

Wednesday, February 20, 2008
16 years ago14,158 views

Can anyone confirm this is new?

http://www.google.com/support/webmasters/bin/answer.py?answer=79892

George R [PersonRank 10]

16 years ago #

Below is the machine translation.
--------------------------------------------
How can I prevent images from appearing in Google search results?

When a user clicks on a result in a search of images with Google, it is redirected to a search results page consists of two frames. The senior contains a thumbnail of the image in question and the lower frame displays the original web page (page on which the image is displayed). If you do not want your site listed as a home page (to restrict traffic for example), you can add a meta tag noimageindex to the header of your web page. Examples:

<meta Name="robots" content="noimageindex">

-- Or --

<meta Name="googlebot" content="noimageindex">

Note that the images of this page will still appear in the index of images if other pages provide links to them.

Philipp Lenssen [PersonRank 10]

16 years ago #

Here is the same page in English:

http://www.google.com/support/webmasters/bin/answer.py?hl=en-uk&answer=79892

It doesn't seem to be new but it seems to be a very obscure value. Only below 1,000 results in Google for "noimageindex" and not a single mention on this blog :)

Tony Ruscoe [PersonRank 10]

16 years ago #

[Slightly off-topic] I don't understand why Google sometimes uses "UK" in language codes to mean "United Kingdom" when the ISO standard is "GB". "UK" actually means "Ukrainian":

http://www.iana.org/assignments/language-subtag-registry

Above 4 comments were made in the forum before this was blogged,

Erwin Jurschitza [PersonRank 1]

16 years ago #

Another way of preventing Google from indexing certain images is to add the following line to the HTTP header while delivering the image itself:

X-Robots-Tag: noindex

Dan Crow explains this here:

http://googleblog.blogspot.com/2007/07/robots-exclusion-protocol-now-with-even.html

We at http://www.zeno.org are using this technique to prevent Google from indexing CERTAIN images (especially nudity in our gallery of paintings) so that we won't get filtered in the default SafeSearch moderate filtering.

Needless to say that I hate this self-censorship, but I see no alternatives. Anyone else?

Matt Cutts [PersonRank 10]

16 years ago #

You could also use
User-agent: *
Disallow: /*.gif

I think both Yahoo and Google support wildcards in this way in a robots.txt

TOMHTML [PersonRank 10]

16 years ago #

As long I remember, "noimageindex" is (was?) used by the search engine Altavista. I have seen it one or two times, no more.

beussery [PersonRank 10]

16 years ago #

UMMM... has this page these pages been hacked? Don't think the red was there before

George R [PersonRank 10]

16 years ago #

It is my undersanding that Matt's suggestion would suggest to all robots not to download or read the "gif" files. This would save you the expense of sending those files to the robots, but it would not keep them out of the index or filter ratings. The robots could still learn of them from pages that reference them. Google probably relies on such pages for index terms and for SafeSearch ratings.

The pages that reference them might not be in your control as the Google help file for "noimageindex" warns.

"Note that the images on the page may still be included in the image index if they are linked to by other pages."

Colin Colehour [PersonRank 10]

16 years ago #

[put at-character here]George R, Using Disallow: /*.gif would make all robots ignore all gif images on your site. So those gif images would not be indexed because of that line. This Disallow could catch other things besides gif images because you aren't specifying that .gif is the end of the string you want to block.

You could specify Disallow: /*.gif$ which would tell the googlebot to disallow any files that end in .gif . I'm not sure if other robots recognize the $ end of line marker though.

Kaila Colbin [PersonRank 1]

16 years ago #

"...image search engines can send a whole lot of traffic to your site..." So true!

My most popular post – which is definitely not my best one – includes a picture of Arnold Schwarzenegger. The only reason it was so popular is that for a while it ranked #1 for Google image searches for 'Terminator 2.'

George R [PersonRank 10]

16 years ago #

I think that the robots.txt rules can prohibit crawling. They do not place any restrictions on indexing. To cache a file a robot may need to read it (i.e crawl it). The html meta tags for robots may restrict indexing via that html file.

Google's web search can index files without crawling them.

Google's image search results show images of its matches. If google image search only indexes what it caches, only caches what it crawls, and honors robots.txt, then it would seem that it would only index what it crawls.

Google's comment in the help file suggests with respect to the meta tag that google may not be this restrictive. Presumably this is about images in the web index.

"Note that the images on the page may still be included in the image index if they are linked to by other pages."

I do not know whether the web page crawl and image crawl are the same crawl. I do not know if the web page index and image index are integrated. Honoring robots.txt is voluntary and Google may have chosen to act more restrictive or less restrictive than indicated.

Philipp Lenssen [PersonRank 10]

16 years ago #

> I do not know whether the web page crawl and
> image crawl are the same crawl. I do not know
> if the web page index and image index are
> integrated. Honoring robots.txt is voluntary
> and Google may have chosen to act more restrictive
> or less restrictive than indicated.

George, as we saw in the past Google does indeed display/ index more than it crawls in web search, e.g. it may show page X even though X is disabled via robots.txt, simply because X was linked from elsewhere. (Google may also display a title, based on what the backlinks text reads.)
However, robots.txt for *images* should be safe for now simply because Google always displays a thumbnail when linking to an image... and I can't see how they could generate that thumbnail at this time unless they access the source image, which they can't* due to the robots.txt.

*Well, technically: can, but don't *want* to access due to their motivation to honor robots.txt when crawling.

Neerav Bhatt [PersonRank 0]

16 years ago #

I agree with Philipp about the traffic benefits gained by allowing your images to be added to Google, & Yahoo Image search etc

For several years I blocked access to my image directory using robots.txt rules to save bandwidth costs

In mid-late 2007 I removed those rules and now in early 2008 many of my images have been indexed and thanks to my detailed ALT tags i get about 3% of site traffic from Image search :-)

PS Bandwidth usage did not change appreciably after my changes

Forum home

Advertisement

 
Blog  |  Forum     more >> Archive | Feed | Google's blogs | About
Advertisement

 

This site unofficially covers Google™ and more with some rights reserved. Join our forum!