[uf-discuss] generic microformat parsing heuristics?

Phil Dawes phil at phildawes.net
Mon Nov 7 13:11:31 PST 2005

Hi Mark,

Mark Pilgrim wrote:
> On 11/7/05, Phil Dawes <phil at phildawes.net> wrote:
>>Out of interest, do you think that a generic microformats parser _can_
>>be written?
>>(e.g. something that could parse hcard, hcal et al out of xhtml without
>>prior knowledge of their precise schemas?)
> No, nor should any effort be expended in such a pursuit.  c.f.
> http://microformats.org/discuss/mail/microformats-discuss/2005-October/001175.html
>  "We don't care about the general case."  This is just the general
> case rearing its ugly head on the parsing side, instead of the
> production side.

Blimey - there's obviously a bit of painful history here!
Ok cool. So I'm not advocating persuing a standard generic model for 
semantic xhtml (I'm *obviously* at the wrong party for that!), just 
wondering if there's some shortcuts I'm missing. Given that there's a 
set of 'Semantic XHTML Design Principles' underpinning each format I 
suspect there's some middle ground here that can be exploited for a bit 
of genericity.

Maybe a table driven parser? ;-)



P.S. a big thanks for feedparser BTW - has saved me weeks of coding time 
at work in the last year

