[uf-dev] Python Microformats parser

anders conbere aconbere at gmail.com
Sun Jan 20 19:18:09 PST 2008


So I've spent a little while developing a new python microformats
parser.  (code below)

http://microformats.googlecode.com/svn/code/python/microformats-parser/uf/

I ran into quite a few hurdles and I've ended up on an implementation
that uses lxml to parse html into an internal xml representation, then
applying an xsl transform to that to arrive at the standard format it
represents, then using the available python parsers for that format to
get back to a python data object.

By and large this actually works pretty well at getting the data out
of microformats. The largest problem I've actually run into is that
the various parsing libraries I use for things like vCard/vCal and
hAtom provide different interfaces, different bugs and different ways
of handling data.

Anyway I would love comments and critiques, and maybe someone has
gotten around all these problems already.

~ Anders


More information about the microformats-dev mailing list