Google Blogoscoped

Forum

Crawler

pokemo [PersonRank 10]

Tuesday, January 29, 2008
15 years ago3,683 views

Can a crawler crawls/index the contents of PDF, DOC....

DPic [PersonRank 10]

15 years ago #

yeah, you can even search for those just by clicking on advanced search haha :)

Zim [PersonRank 10]

15 years ago #

And even read them as plain html from Google :)

Tony Ruscoe [PersonRank 10]

15 years ago #

[filetype:pdf]
http://www.google.com/search?q=filetype%3Apdf

[filetype:doc]
http://www.google.com/search?q=filetype%3Adoc

Ionut Alex. Chitu [PersonRank 10]

15 years ago #

There are free/open source tools that convert a PDF, DOC to HTML or plain text.

http://www.google.com/search?hl=en&q=pdf2html&btnG=Search
http://www.google.com/search?hl=en&q=antiword&btnG=Search

Colin Colehour [PersonRank 10]

15 years ago #

The crawler will only index searchable PDFs. So if the PDF is a scanned image that was not OCRed, it will not be indexable by the crawler.

Forum home

Advertisement

 
Blog  |  Forum     more >> Archive | Feed | Google's blogs | About
Advertisement

 

This site unofficially covers Google™ and more with some rights reserved. Join our forum!