Google Blogoscoped

Monday, June 23, 2008

From Google Docs to InDesign

Brian Jepson is the editor who worked with me on the book Google Apps Hacks, and I asked him to describe the technical process of what it took to convert the book from Google Documents – in which the book was written – to the InDesign format. Here's his explanation.

One of the tricky parts for us was getting the chapters into InDesign, the book layout program we use for our books and magazines. I know where we were coming from – Google Docs' HTML format – and where we were going to – InDesign's tagged text format. For example, here's what one of the Google Docs documents looked like:

   #"Google Docs light" for Web Research: Google
 <i>Set up your Google Notebook to copy snippets
 and jot down thoughts while
 surfing the web.</i><br>

In InDesign's tagged text format, it needed to look like this:

<ParaStyle:Heading 1>#"Google Docs light" for Web
  Research: Google Notebook
<ParaStyle:Synopsis>Set up your Google Notebook
  to copy snippets and jot down thoughts while
  surfing the web.

Most of my XML programming knowledge is frozen in time, so I turned to a couple of trusty tools that I was familiar with: tidy, for making sure that the HTML I got out of Google was proper HTML, and Perl's XML::SAX module, which I used to traverse the document and crank out the InDesign Tagged Text format as I went. I'm sure there are dozens of better ways to do it, but this had the advantage of being quick and familiar for me to work with.

The first trick was getting the HTML files out of Google Docs. For this, I wrote a simple module ( that could authenticate itself to Google and request a zipped Google document, which includes the HTML file and images. is pretty simple to use. Just authenticate with a Google login (it prompts the user interactively for the password) and specify a DocId (which you can find in the URL of the Google document):

use GDoc;
my $gdoc = GDoc->new;
if ( !$authenticated ) {
 $authenticated = $gdoc->authenticate('');
$gdoc->download( $docId, $filename );

After that, I fed the document into the tidy processor and stored the text in a Perl variable. That was the easy part. Without going into the gory details, the hard part was to set up a SAX parser whose events were triggered as bits of the HTML document were encountered. The program had to keep track of figures, number them as it went, and rename the files to something that Make's production crew would find easier to work with than the defaults. Something like Figure-01-01.png is much more pleasant than GOHacks___Documents___Linked_ ToC_images/ajfjf92tmnhd_6124cj36bd5.png!

In this way, I turned each chapter into a text file that could be easily placed into an InDesign file. After that, I handed it off to the design team, who made it look beautiful!


Blog  |  Forum     more >> Archive | Feed | Google's blogs | About


This site unofficially covers Google™ and more with some rights reserved. Join our forum!