Madridâ€™s Complutense University Library in Google Book Search

Forum

Madridâ€™s Complutense University Library in Google Book Search
Ionut Alex. Chitu	Tuesday, September 26, 2006 20 years ago • 3,462 views
"Out-of-copyright books previously only available to people with access to Madridâ€™s Complutense University Library, or the money to travel, will now be accessible to everyone with an Internet connection, wherever they live [through Google Book Search]." http://booksearch.blogspot.com/2006/09/madrids-complutense-university-opens.html
Philipp Lenssen	20 years ago #
On a related note, I managed to get a lot of you into Google Books, albeit this didn't seem to work for everyone! http://books.google.com/books?vid=ISBN1411693418&id=-XDkb3htVikC&pg=PA225&lpg=PA225&dq=tony+ruscoe&sig=SF_XJ-4wGWzmB889OnHm_3dFGAg E.g. names that work: [brinke guthrie] [tony ruscoe] Names that don't work even though they appear in the book: [ionut alex. chitu] [tomhtml]
Ionut Alex. Chitu	20 years ago #
This works: http://books.google.com/books?vid=ISBN1411693418&id=-XDkb3htVikC&pg=PA225&lpg=PA225&vq=alex&dq=tony+ruscoe&sig=SF_XJ-4wGWzmB889OnHm_3dFGAg
Tony Ruscoe	20 years ago #
And [alex chitu] works too: http://books.google.com/books?vid=ISBN1411693418&id=-XDkb3htVikC&pg=PA225&lpg=PA225&dq=alex+chitu&sig=SF_XJ-4wGWzmB889OnHm_3dFGAg Does Google use OCR to index these books? It looks like it's mis-read "ionut" as something else as it doesn't seem to highlight it.
Tony Ruscoe	20 years ago #
Yup... [lonut alex chitu] (note the lowercase "L") works: http://books.google.com/books?q=lonut+alex+chitu :-)
Ionut Alex. Chitu	20 years ago #
>>Does Google use OCR to index these books? What else? Maybe they should accept already-digitized books. You just submit a PDF, for example.
Tony Ruscoe	20 years ago #
Well, I thought they might just extract the text from the PDF (where possible).
Philipp Lenssen	20 years ago #
> Yup... [lonut alex chitu] (note the lowercase "L") works: Nice find. It never occurred to me that you can optimize some Google Book searches by using common OCR (0<R?) errors. > Well, I thought they might just extract the text from > the PDF (where possible). True, Lulu.com could just send the PDF to Google Book Search. But this is one of those Dilbert cases where you type something in the computer, print it out, send the printout to a company and they scan it and OCR it :) . I suppose it's easier for Google to have a single well-working process than to optimize for small fish like Lulu. So I guess in this case they have more trouble because the words are in italics and not found in an English dictionary.
zmarties	20 years ago #
It's not just the text that Google want, it's information about where on the page the words are found so that they can highlight them. Extracting text from PDF is rarely straightforward – extracting positional information is much harder still, so the simpest way is as you say to simply print the PDF out and OCR it. There are some documents where the OCR process goes very wrong. Consider this page of old printed English, with the "long s" symbol, which looks like a modern "f" character http://books.google.com/books?vid=0zC-lOSEfrndAeDyn0&id=jiJAtfLSg_sC&pg=PA77&lpg=PA77&dq=fale&num=100&as_brr=1 By the way, is anyone else seeing the problem that the highlighting, which is supposed to be on the word "sale" (which I had to search for as "fale") in all cases misses the actual word – being always below and to the right of where it should be?
Philipp Lenssen	20 years ago #
The yellow marker is off here too...

Forum home

>> More posts

Blog | Forum more >> Archive | Feed | Google's blogs | About

This site unofficially covers Google™ and more with some rights reserved. Join our forum!