Can a crawler crawls/index the contents of PDF, DOC.... |
yeah, you can even search for those just by clicking on advanced search haha :) |
And even read them as plain html from Google :) |
There are free/open source tools that convert a PDF, DOC to HTML or plain text.
http://www.google.com/search?hl=en&q=pdf2html&btnG=Search http://www.google.com/search?hl=en&q=antiword&btnG=Search |
The crawler will only index searchable PDFs. So if the PDF is a scanned image that was not OCRed, it will not be indexable by the crawler. |