Google Blogoscoped

Forum

No more cached copy of PDF files?

milivella [PersonRank 10]

Wednesday, June 4, 2008
16 years ago3,966 views

It seems that Google doesn't offer any more the chance to view the cached copy (html) of PDF files. See e.g.
http://www.google.com/search?q=filetype%3Apdf

Why?

Don't know whether is old news.

Tony Ruscoe [PersonRank 10]

16 years ago #

Weird. Some files still have the "View as HTML" link when you don't search using the filetype operator. There are some in these search results:

e.g. http://www.google.com/search?q=site%3Aadobe.com+inurl%3Apdf

Ionut Alex. Chitu [PersonRank 10]

16 years ago #

I'd say that PDF results that can't be viewed as HTML are quite rare:
http://www.google.com/search?hl=en&q=filetype%3Apdf+hamlet
http://www.google.com/search?q=filetype%3Apdf+confidential

Tony Ruscoe [PersonRank 10]

16 years ago #

There's no good reason why they can't be viewed as HTML. From what I've seen of PDF to HTML converters, if Google can extract the text, it should be able to display them as HTML.

milivella [PersonRank 10]

16 years ago #

So some PDF can be viewed as HTML, some cannot, and the reason is not copyright (if I'm not wrong)...

Tony Ruscoe [PersonRank 10]

16 years ago #

Correct. It doesn't seem to be related to copyright:

http://www.google.com/search?q=filetype%3Apdf+%22Copyright+2000..2008%22+%22All+rights+reserved%22

Ianf [PersonRank 10]

16 years ago #

I have a vague memory of this universal pdf-as-html policy beeing changed a year+ ago, making Google observe the no-copy-text flag in the docs in line with Adobe Acrobat's requirements (different from print flag).

Roger Browne [PersonRank 10]

16 years ago #

I also remember Google announcing the policy change that Ianf mentions, although I'd say it was more like two years ago.

milivella [PersonRank 10]

16 years ago #

Thanks for the feedback about the policy change (from a quick search it seems not easy to find informations about it, though).

Forum home

Advertisement

 
Blog  |  Forum     more >> Archive | Feed | Google's blogs | About
Advertisement

 

This site unofficially covers Google™ and more with some rights reserved. Join our forum!