[uf-discuss] generic microformat parsing heuristics?
phil at phildawes.net
Mon Nov 7 05:42:25 PST 2005
(Apologies if you get this twice - the microformats list doesn't appear
to like sourceforge addresses.)
I've recently been playing with microformats a bit and have added some
basic hcard and hcalendar parsing to my structured data aggregator
program JAM*VAT (enough to parse Tantek's
http://tantek.com/log/2005/10.html page). Unfortunately this is proving
much more complicated than I originally thought, and was wondering if
there is a bigger picture that I'm missing.
So my question is:
Is there a set of heuristics that can be employed to generically parse
(all of the) microformats?
(or at least get reasonable results)
I ask this because JAM*VAT is able to employ some basic heuristics to
parse pretty much any data oriented XML format into a set of reasonable
semantic statements (JAM*VAT uses a very simple scheme for representing
semantic statements). I'd like to be able to do something similar
with semantic XHTML.
More information about the microformats-discuss