[microformats-discuss] Resilient File Formats
Kevin Marks
kmarks at technorati.com
Mon Oct 3 16:39:20 PDT 2005
On the question of which file formats succeed, the answer is those that
are resilient. The ones that provide a method for expansion, and a way
for multiple versions to coexist safely.
Backwards compatibility is a necessary part of this, but it is not
sufficient - forwards compatibility is what wins out
I see 3 big generations of file format here:
RFC 822 style (ascii key:value, as in Mail headers and HTTP headers)
IFF style (keyed binary blobs with length offsets) (IFF, AIFF, TIFF,
QuickTime, WAV, AVI, MPEG4)
SGML style (ascii <tag> </tag> model) (SGML, HTML, XML, XHTML)
In each case, these define a way for different generations of the same
format to coexist by defining that it is OK to discard elements you
don't understand.
This provides baseline compatibility (old parsers generally don't crash
on new data, unlike more naive formats), but still requires work to
define the sub elements of the format to interoperate.
It provides for graceful degradation, with older or less-featured
clients able to display the subset they understand, rather than balking
completely.
If you replace an element with a more general one, you may need to
continue to include the old version for the previous generation of
parsers.
Having worked at Apple on QuickTime for 5 years, and spent 10 years
before that tracking it, I've seen that it does take some care to adapt
and update in a way that will not break old clients, but the benefits
for users of your format are immense (the unofficial motto there was
'no movie left behind'). Of course, if your users are happy, this helps
your adoption.
HTML took this from SGML, and in many ways expanded it further due to
the toleration of sloppy markup from user-agents, to the point where
people writing parsers had a bit of tough time of it.
XML was an over-reaction to this - it instituted draconian parsing by
design, and effectively gave the green light for everyone to make up
their own format without consideration for others at all (with
namespaces as a figleaf to cover this, and coerce coexistence post
hoc).
Microformats build on the older model of backward compatibility through
selective enhancement. This is a bit more work for the parser and
format designer, but much less for those creating data using the
format, who can readily pick up the latest version to enhance their
existing HTML without harming their other uses.
Working within XHTML does impose constraints on how you can express
things, but as Cory Doctorow put it last week:
http://www.salon.com/tech/feature/2005/09/26/themepunks_3/print.html
> "It's like this: engineering is all about constraint. Given a span of
> foo feet and materials of tensile strength of bar, build a bridge that
> doesn't go all fubared. Write a fun video-game for an eight-bit
> console that'll fit in 32K. Build the fastest airplane, or the one
> with the largest carrying capacity... But these days, there's not much
> traditional constraint. I've got the engineer's most dangerous luxury:
> plenty. All the computational cycles I'll ever need. Easy and rapid
> prototyping. Precision tools.
Working with constraints is what makes for good Art, and good
Engineering, whether the constraints are cultural or structural.
Without shared meaning there can be no communication. Microformats work
to converge shared meaning without disrupting other uses, and to
enhance rather than replace what you are doing already.
This started as a mail reply, but it became a blog posts somewhere
along the way:
http://epeus.blogspot.com/
2005_10_01_epeus_archive.html#112838262830478654
More information about the microformats-discuss
mailing list