Google Blogoscoped

Forum

Crawler

pokemo [PersonRank 10]

Tuesday, January 29, 2008
12 years ago3,134 views

Can a crawler crawls/index the contents of PDF, DOC....

DPic [PersonRank 10]

12 years ago #

yeah, you can even search for those just by clicking on advanced search haha :)

Zim [PersonRank 10]

12 years ago #

And even read them as plain html from Google :)

Tony Ruscoe [PersonRank 10]

12 years ago #

[filetype:pdf]
google.com/search?q=filetype%3 ...

[filetype:doc]
google.com/search?q=filetype%3 ...

Ionut Alex. Chitu [PersonRank 10]

12 years ago #

There are free/open source tools that convert a PDF, DOC to HTML or plain text.

google.com/search?hl=en&q= ...
google.com/search?hl=en&q= ...

Colin Colehour [PersonRank 10]

12 years ago #

The crawler will only index searchable PDFs. So if the PDF is a scanned image that was not OCRed, it will not be indexable by the crawler.

This thread is locked as it's old... but you can create a new thread in the forum. 

Forum home

Advertisement

 
Blog  |  Forum     more >> Archive | Feed | Google's blogs | About
Advertisement

 

This site unofficially covers Google™ and more with some rights reserved. Join our forum!