[uf-dev] Parsing XOXO "text" Property Value

Dimitri Glazkov dimitri.glazkov at gmail.com
Wed Aug 8 12:03:43 PDT 2007


Inferring from the spec, the parsing of the "text" property value
occurs following these rules:

* If the first child of LI is a text node, it is the "text" value
* Otherwise, if there is a child A element, the sub-tree of the first
(or last?) A element is the "text" value
* Otherwise, if there is a child DL element and the DT element with
one child text node with the value of "text", and the following DD
element, the sub-tree of the first (or last?) element matching this
rule is the "text" value
* Otherwise, the "text" value is empty

Last/first rule is impossible to infer from provided examples in the
spec, but it appears that all parsers use bag/dictionary to keep track
of the properties, and all of them re-assign value when encountered,
which leads me to believe that _last_ is the status quo in the
implemented parsers.

I attempted to read the code of xoxo.py to study how it implements it,
but I am a newbie to Python, so it's taking a bit of time. I was
wondering if I could get some remedial help from a more seasoned
Python developer to reconstruct its parsing algorithm.

Obviously, the feedback on the parsing algorithm is much appreciated.

:DG<


More information about the microformats-dev mailing list