[uf-discuss] Somewhat Universal Microformat Parser

David Janes -- BlogMatrix davidjanes at blogmatrix.com
Wed Dec 7 06:51:45 PST 2005


Here's what I've been working on for the last couple of days. It's a 
service -- actually, a front end onto a Python library/framework -- that 
can rip apart microformats into a (hopefully) simpler format that will 
be easier for programs to manipulate.

pages:
- the interface [1]
- an example of hAtom parsing [2]

you can paste XHTML fragments in -- try something from the hReview page [3].

microformats supported:
- hatom - pretty good
- hreview - a lot of work is needed
- hcard - pretty good
- rel-tag - actually, a slightly expanded "rel-reviewed-tag" from hreview

I hope to have vCalendar and xEntry in their this afternoon/tomorrow.

Here's what a parser looks like [4]

Regards, etc...
David
http://www.blogmatrix.com

[1] http://www.davidjanes.com/microformats/extract/
[2] 
http://www.davidjanes.com/microformats/extract/?uri=http%3A%2F%2Fblog.davidjanes.com%2F&microformat=hatom&submit=Submit
[3] http://microformats.org/wiki/hreview
[4]

class MicroformatHReview(microformat.Microformat):
   def __init__(self):
     microformat.Microformat.__init__(self, "hreview")

     self.CollectClassText('version')
     self.CollectClassText('summary', text_type = microformat.TT_XML_INNER)
     self.CollectClassText('description', text_type = 
microformat.TT_XML_INNER)
     self.CollectClassText('type')
     self.CollectClassText('dtreviewed', text_type = microformat.TT_ABBR_DT)
     self.CollectClassText('info', text_type = microformat.TT_XML_OUTER)
     self.CollectClassText('reviewer', text_type = microformat.TT_XML_OUTER)
     self.CollectRelAttribute('permalink', 'href')

     self.CollectClassText('rating', text_type = microformat.TT_ABBR)
     self.CollectClassText('best', text_type = microformat.TT_ABBR)
     self.CollectClassText('worst', text_type = microformat.TT_ABBR)

     self.CollectClassModifier('item')

     self.CollectRelReparse('tag', reltag.MicroformatRelTag())
     self.CollectClassReparse('reviewer', hcard.MicroformatHCard())

     self.DeclareRepeatingName('reviewer')
     self.DeclareRepeatingName('tag')


More information about the microformats-discuss mailing list