Re: Welcome Håkon! (and Re: [uf-discuss] book brainstorming )

Mark Pilgrim pilgrim at
Mon Jan 30 12:19:33 PST 2006

On 1/30/06, Tantek Çelik <tantek at> wrote:
> We need more instances of and better documentation and analysis of the
> book-examples:
> I'll note that we're also fortunate to have Mark Pilgrim on the list, who
> has very direct experience with publishing online versions of print books he
> has written.

I added some more representative links to book-examples.

As for briding the gap between web and print... in my (2-book)
experience, the bottleneck is the publishers.  I self-published Dive
Into Python online for 4 years before Apress threw enough money at me
to convince me to finish it, so I had *lots* of time to experiment and
settle on a toolchain that produced passable HTML, PDF, and text.

I wrote the book entirely in DocBook XML; Apress was entirely
Microsoft Word-based.  To send chapters to my editor, I transformed it
to an intermediate HTML format and then wrote a Python script to tell
Microsoft Word to load the HTML and save it as a native .doc file. 
(Yes, I am aware that's cheating. :)  The editors had very few
changes, so that stage went smoothly.

The trouble started when we went to copyedit.  My copy editor also
only accepted Word files.  She had a ton of smallish style changes,
which I had to backport to the original DocBook XML files so I could
publish the changes online.  (This was allowed as part of my
contract.)  She also complained bitterly that the auto-generated Word
files had lots of extraneous cruft in them, things which I never saw
but which were apparent to someone who lives their life in Word.  (I
know how she feels, in reverse -- I cringe when someone takes a Word
file and auto-generates HTML out of it.)  We were never able to
satisfactorily resolve them; she wasn't technical enough to know how
to fix it, and I don't know enough about Word to know what she was
talking about.  Equal experts, different worlds.

For my recent O'Reilly book, Greasemonkey Hacks, I asked if I could
write it in DocBook XML, and my editor (the *wonderful* *marvelous*
*talented* *underpaid* Brian Sawyer) got one of those "oh shit" looks
on his face and recommended we do it in Word instead.  So I wrote up
100 hacks in ASCII text files, then manually copied and pasted
sentences and paragraphs into Word, and then manually formatted them
to conform to O'Reilly's highly customized Word templates.  (IIRC,
O'Reilly has semi-automated processes to take these Word files and
convert them to Framemaker.)  At that point, I threw away the original
text files and we all did edits, techedits, and copyedits entirely in
Microsoft Word.

I see that Rael finally finished his wiki-based submission web
application for O'Reilly authors:  I
originally volunteered to be the guinea pig for this with
"Greasemonkey Hacks", but Rael decided the system wasn't ready yet.

I have heard unconfirmed reports that both Apress and O'Reilly did
everything on paper until a few years ago (mailing edits around via
Fedex, etc).  So as icky as Microsoft Word sounds to this community,
it was a big step forward for them in terms of computerization.

Not sure where this gets us, except to say that Word is really leading
edge stuff for publishers at the moment, and anything-but-Word is so
bleeding edge for publishers that I'm skeptical that it's even worth
spending any time on it.  I'd be happy to use my existing online books
and my battle-tested DocBook toolchain as a testbed for outputting
semantically richer HTML with pretty CSS printing, but I don't believe
that anyone but hobbyists will ever use it.  I would, of course, be
ecstatic to be proven wrong.


More information about the microformats-discuss mailing list