[uf-discuss] generic microformat parsing heuristics?
Phil Dawes
phil at phildawes.net
Mon Nov 7 05:42:25 PST 2005
Hi All,
(Apologies if you get this twice - the microformats list doesn't appear
to like sourceforge addresses.)
I've recently been playing with microformats a bit and have added some
basic hcard and hcalendar parsing to my structured data aggregator
program JAM*VAT[1] (enough to parse Tantek's
http://tantek.com/log/2005/10.html page). Unfortunately this is proving
much more complicated than I originally thought, and was wondering if
there is a bigger picture that I'm missing.
So my question is:
Is there a set of heuristics that can be employed to generically parse
(all of the) microformats?
(or at least get reasonable results)
I ask this because JAM*VAT is able to employ some basic heuristics[2] to
parse pretty much any data oriented XML format into a set of reasonable
semantic statements (JAM*VAT uses a very simple scheme for representing
semantic statements[3]). I'd like to be able to do something similar
with semantic XHTML.
Many thanks,
Phil
[1] http://phildawes.net/jamvat/
[2]
http://www.phildawes.net/blog/2005/09/16/xml-to-tagtriples-and-mapping-heuristics/
[3] http://tagtriples.sf.net/
More information about the microformats-discuss
mailing list