Tim Bray on creating XML Dialects

Tim Bray has a thorough essay on the pros and cons (mostly cons) of inventing new XML dialects.

Tim starts by saying…

Designing XML Languages is hard. It’s boring, political, time-consuming, unglamorous, irritating work. It always takes longer than you think it will, and when you’re finished, there’s always this feeling that you could have done more or should have done less or got some detail essentially wrong.

…. which pretty well sums up the challenges with creating new document formats for the Web. Of course, we try to eliminate some of these drawbacks when doing microformats- mostly be focusing on existing behaviors on the web and aiming for the 80% use case (rather than trying to satisfy every edge case), or in Tim’s words, “do[ing] less.”

As Tim went on to describe the challenges and pitfalls of creating arbitrary XML dialects, I was already preparing a “Just use microformats!” response in my head. But, alas, Tim beat me to the punch.

Along with DocBook, ODF, UBL and Atom, he recommends “XHTML+Microformats” as a way to reuse an existing XML dialect, and thereby bypass some of the birth pains of creating a new format. Tim says:

If you’re delivering information to humans over the Web, even if you don’t think of it as “Web Pages”, it’s almost certainly insane not to use XHTML. Yes, XHTML is semantically weak and doesn’t really grok hierarchy and has a bunch of other problems.

Thanks, Tim, for the endorsement of Microformats here.

Of course, the fact that the language is semantically weak, doesn’t seem like that big a deal to me, since we can build on top of the semantics it does have (instead of reinventing things like lists, links and paragraphs). And for hierarchies of things, you can always use .

Creating new XML languages is a hard task and not likely to be rewarding. We don’t need more arbitrary formats, each with their own namespace and slightly different semantics.

4 Responses to “Tim Bray on creating XML Dialects”

  1. Danny :

    Re. “semantically weak”, yup, I don’t think it is a big deal. Because XHTML allows you to associate metadata profiles with the docs, you can in effect define whatever semantics you like.

    January 10th, 2006 at 9:45 am

  2. Rod Boothby :

    I could not agree more with the idea that Microformats can be used like so much conceptual lego, forming the building blocks of new XML dialects.

    It is similar to good, basic refactoring. Break things into the smallest reusable pieces possible.

    January 12th, 2006 at 4:46 pm

  3. Mark Wilson :

    Totally agree. I used to think XML or RDF would resolve the data problem, but I early on (after writing a book on it 7 years ago) realised that wouldn’t happen. But XHTML + Microformats really has a chance IMO. I blog a bit on my thoughts about this.

    February 1st, 2006 at 2:11 am

  4. scott romack aka shaggy :

    I disagree slightly,

    While I’m all for microformats, they could be viewed as the output from XML transformations or hard coded XHTML or combinations of both. I think Tim Bray was talking about writing new XML formats which wouldn’t be necessary it the case of HTML. My thoughts are that we may need to extend them, perhaps, in the future. My current interest in XML is generating HTML prototypes for prototyping but of course there are other ways of doing that. Anyway, the main thing is to separate content from presentation and Standardize, which is what you are helping us do.

    Keep up the great work!

    February 17th, 2006 at 12:35 pm