Google Blogoscoped

Thursday, January 3, 2008

Converting Google Book PDFs to Actual Books

PublicDomainReprints.org allows you to paste in a book URL from the Google Books program – at least those books in the public domain, which Google offers as full PDFs – to then order them via print-on-demand service Lulu.com. (To find just public domain books on Google Book Search, check the “Full view” option on the Advanced Book Search page.) Other than Google Books, Public Domain Reprints also supports sites like the Internet Archive.

I asked Yakov Shafranovich from Baltimore, creator of the service and developer for 15 years, a couple of questions... like what technology is behind the site. Yakov, who is still half time in graduate school, says that processing of books is done in automated via the Lulu API. Yakov explains:

What happens is that I have the publicdomainreprints.org keep track of requests and a separate set of Perl scripts does the processing. Each instance of the Perl scripts can run on its own server (i.e. Amazon EC2 for example) and process stuff independently of the main server. When completed, it notifies the main server which keeps track of requests.

The actual processing varies for Google Book and for Internet Archive: Google Books is converted from PDF to prepress PDF with Ghostscript, and then resized using XPDF tools. Covers and additional pages are generated using Apache’s FOP project. For Internet Archive, the PDFs generated by the archive are unusable for printing so instead I have to convert from DJVU to PDF using the free DjvuLibre library and then the same set of tools as for Google books. Perl scripts control the overall process.

Yakov says he sold 42 books so far (30 were set up as test, and 4 were bought by him). Yakov is adding a commission of under a dollar to every book but considers the service non-commercial, just making enough money to cover costs for paying Amazon.

As Yakov is utilizing Google’s scans, among other services, he asked Google about what he was allowed to do with the full view PDFs. However, he says so far he basically just received form replies along the lines of what is available on Google’s help page on the subject (also see a previous experiment here). This is part of why Yakov tries to set-up the service as non-commercial, he says:

Some of the book from the Internet Archive and all of Google Books stuff has a non-commercial restriction. While I do have a friend in Harvard law school, I don’t want to end up on the receiving end of a Google or Microsoft lawsuit. I am aware that the public domain issue involved here is open to debate and personally I would love to make it commercial, I am treading carefully as to not to upset either Google nor the Internet Archive.

(Not all books converted from Google’s service keep their Google watermark, though... even without Yakov actively removing it.)

Yakov adds that there is going to be a commercial service for this but that he “can’t share any more details on it” as he’s not the one running it, but just responsible for the tech side of it. The commercial project is supposed to co-exist with PublicDomainReprints.org in the future.

Advertisement

 
Blog  |  Forum     more >> Archive | Feed | Google's blogs | About
Advertisement

 

This site unofficially covers Google™ and more with some rights reserved. Join our forum!