[microformats-discuss] Resilient File Formats

Kevin Marks kmarks at technorati.com
Mon Oct 3 16:39:20 PDT 2005


On the question of which file formats succeed, the answer is those that  
are resilient. The ones that provide a method for expansion, and a way  
for multiple versions to coexist safely.
Backwards compatibility is a necessary part of this, but it is not  
sufficient - forwards compatibility is what wins out

I see 3 big generations of file format here:

RFC 822 style (ascii key:value, as in Mail headers and HTTP headers)
IFF style (keyed binary blobs with length offsets) (IFF, AIFF, TIFF,  
QuickTime, WAV, AVI, MPEG4)
SGML style (ascii <tag>  </tag> model) (SGML, HTML, XML, XHTML)

In each case, these define a way for different generations of the same  
format to coexist by defining that it is OK to discard elements you  
don't understand.

This provides baseline compatibility (old parsers generally don't crash  
on new data, unlike more naive formats), but still requires work to  
define the sub elements of the format to interoperate.

It provides for graceful degradation, with older or less-featured  
clients able to display the subset they understand, rather than balking  
completely.

If you replace an element with a more general one, you may need to  
continue to include the old version for the previous generation of  
parsers.

Having worked at Apple on QuickTime for 5 years, and spent 10 years  
before that tracking it, I've seen that it does take some care to adapt  
and update in a way that will not break old clients, but the benefits  
for users of your format are immense (the unofficial motto there was  
'no movie left behind'). Of course, if your users are happy, this helps  
your adoption.

HTML took this from SGML, and in many ways expanded it further due to  
the toleration of sloppy markup from user-agents, to the point where  
people writing parsers had a bit of tough time of it.

XML was an over-reaction to this - it instituted draconian parsing by  
design, and effectively gave the green light for everyone to make up  
their own format without consideration for others at all (with  
namespaces as a figleaf to cover this, and coerce coexistence post  
hoc).

Microformats build on the older model of backward compatibility through  
selective enhancement. This is a bit more work for the parser and  
format designer, but much less for those creating data using the  
format, who can readily pick up the latest version to enhance their  
existing HTML without harming their other uses.

Working within XHTML does impose constraints on how you can express  
things, but as Cory Doctorow put it last week:

http://www.salon.com/tech/feature/2005/09/26/themepunks_3/print.html
> "It's like this: engineering is all about constraint. Given a span of  
> foo feet and materials of tensile strength of bar, build a bridge that  
> doesn't go all fubared. Write a fun video-game for an eight-bit  
> console that'll fit in 32K. Build the fastest airplane, or the one  
> with the largest carrying capacity... But these days, there's not much  
> traditional constraint. I've got the engineer's most dangerous luxury:  
> plenty. All the computational cycles I'll ever need. Easy and rapid  
> prototyping. Precision tools.

Working with constraints is what makes for good Art, and good  
Engineering, whether the constraints are cultural or structural.

Without shared meaning there can be no communication. Microformats work  
to converge shared meaning without disrupting other uses, and to  
enhance rather than replace what you are doing already.

This started as a mail reply, but it became a blog posts somewhere  
along the way:
http://epeus.blogspot.com/ 
2005_10_01_epeus_archive.html#112838262830478654



More information about the microformats-discuss mailing list