[uf-discuss] generic microformat parsing heuristics?

Tantek Ç elik tantek at cs.stanford.edu
Mon Nov 7 08:53:46 PST 2005


Take a look at hCard parsing:


Much of which is embodied there generalizes to other microformats.



On 11/7/05 5:42 AM, "Phil Dawes" <phil at phildawes.net> wrote:

> Hi All,
> (Apologies if you get this twice - the microformats list doesn't appear
> to like sourceforge addresses.)
> I've recently been playing with microformats a bit and have added some
> basic hcard and hcalendar parsing to my structured data aggregator
> program JAM*VAT[1] (enough to parse Tantek's
> http://tantek.com/log/2005/10.html page). Unfortunately this is proving
> much more complicated than I originally thought, and was wondering if
> there is a bigger picture that I'm missing.
> So my question is:
> Is there a set of heuristics that can be employed to generically parse
> (all of the) microformats?
> (or at least get reasonable results)
> I ask this because JAM*VAT is able to employ some basic heuristics[2] to
> parse pretty much any data oriented XML format into a set of reasonable
> semantic statements (JAM*VAT uses a very simple scheme for representing
> semantic statements[3]). I'd like to be able to do something similar
> with semantic XHTML.
> Many thanks,
> Phil
> [1] http://phildawes.net/jamvat/
> [2] 
> http://www.phildawes.net/blog/2005/09/16/xml-to-tagtriples-and-mapping-heurist
> ics/
> [3] http://tagtriples.sf.net/
> _______________________________________________
> microformats-discuss mailing list
> microformats-discuss at microformats.org
> http://microformats.org/mailman/listinfo/microformats-discuss

More information about the microformats-discuss mailing list