It seems that Google doesn't offer any more the chance to view the cached copy (html) of PDF files. See e.g. http://www.google.com/search?q=filetype%3Apdf
Why?
Don't know whether is old news. |
Weird. Some files still have the "View as HTML" link when you don't search using the filetype operator. There are some in these search results:
e.g. http://www.google.com/search?q=site%3Aadobe.com+inurl%3Apdf |
I'd say that PDF results that can't be viewed as HTML are quite rare: http://www.google.com/search?hl=en&q=filetype%3Apdf+hamlet http://www.google.com/search?q=filetype%3Apdf+confidential |
There's no good reason why they can't be viewed as HTML. From what I've seen of PDF to HTML converters, if Google can extract the text, it should be able to display them as HTML. |
So some PDF can be viewed as HTML, some cannot, and the reason is not copyright (if I'm not wrong)... |
Correct. It doesn't seem to be related to copyright:
http://www.google.com/search?q=filetype%3Apdf+%22Copyright+2000..2008%22+%22All+rights+reserved%22 |
I have a vague memory of this universal pdf-as-html policy beeing changed a year+ ago, making Google observe the no-copy-text flag in the docs in line with Adobe Acrobat's requirements (different from print flag). |
I also remember Google announcing the policy change that Ianf mentions, although I'd say it was more like two years ago. |
Thanks for the feedback about the policy change (from a quick search it seems not easy to find informations about it, though). |