[uf-discuss] a case for formal schema specs for microformats

Wed May 31 17:12:11 PDT 2006

I've been giving some thought to the problems (as I see them) of parsing 
microformatted data inside web pages, and of nesting microformats inside 
each other.  I should say I'm fully on-board with the understatement and 
general "fuzziness" of microformats, particularly when it comes to 
authoring content.  But I think it creates problems when it comes to 
parsing content, particularly when one microformat *might* contain another.

I've been thinking that if there were a formal, machine-readable spec in 
a BNF or DTD-style of each microformat, then parsers could be generated 
by machines from the spec, relieving developers from writing 
hand-parsers.  These machine generated parsers can be designed to handle 
nesting of microformats (e.g., can tell the difference between a "title" 
in a hentry and a "title" in an hcard).

I've written up this idea on my blog here: 
http://smackman.com/2006/06/01/an-old-idea/

I think it's actually very straight-forward to implement, puts no new 
burden on the authors adding microformat syntax to their web pages, and 
relieves parser-writers of a lot of work.

Thoughts?