[uf-dev] Python Microformats parser
anders conbere
aconbere at gmail.com
Sun Jan 20 19:18:09 PST 2008
So I've spent a little while developing a new python microformats
parser. (code below)
http://microformats.googlecode.com/svn/code/python/microformats-parser/uf/
I ran into quite a few hurdles and I've ended up on an implementation
that uses lxml to parse html into an internal xml representation, then
applying an xsl transform to that to arrive at the standard format it
represents, then using the available python parsers for that format to
get back to a python data object.
By and large this actually works pretty well at getting the data out
of microformats. The largest problem I've actually run into is that
the various parsing libraries I use for things like vCard/vCal and
hAtom provide different interfaces, different bugs and different ways
of handling data.
Anyway I would love comments and critiques, and maybe someone has
gotten around all these problems already.
~ Anders
More information about the microformats-dev
mailing list