[uf-dev] Parsing By Element
Tantek Ç elik
tantek at cs.stanford.edu
Thu Sep 13 11:57:58 PDT 2007
On 9/13/07 6:54 AM, "Brian Suda" <brian.suda at gmail.com> wrote:
> Tantek and I were talking about how we should construct a list of
> elements and explain how they are being parsed. For example,
>
> <a href="http://example.org" class="fn url">John Doe</a>
>
> We know that FN becomes "John Doe" and URL becomes http://example.org
> but there is very little documentation (atleast in one place) about
> how and when these rules are invoked.
Actually, there is quite a bit of documentation about this, and it is in
*only* one place currently:
http://microformats.org/wiki/hcard-parsing
> I created a parsing page on the wiki (feel free to move it as needed)
> http://microformats.org/wiki/parsing
>
> It takes the W3C list of HTML elements and begins to map how and where
> to extract the values for various microformats properties.
I saw that, I'm not sure that a raw element list is the right way to start
that.
I've been trying to complete the *semantic* element and attribute lists as
well as group them into logical sets for mnemonic purposes here:
http://microformats.org/wiki/semantic-xhtml
>From that, the next step is an audit of hcard-parsing to see if I'm missing
any special element handling (like <input> for example) and derive parsing
rules per element semantics accordingly and finish writing them up here:
<http://microformats.org/wiki/hcard-brainstorming#Additional_Semantic_HTML_h
andling>
Then let's take a look at the open source implementations (X2V, hKit,
Operator) and determine if it is fairly straightforward to add any
additional element-specific semantic handling - I expect that implementation
updates should be fairly trivial for a few special cases.
Simultaneously, test cases which exercise the new parsing cases will help as
well.
Once we've gotten all that working for hCard, then I'll draft up
/wiki/hcalendar-parsing accordingly as well.
With the practical experience of hCard and hCalendar parsing, I'll
extract/abstract common bits and draft /wiki/compound-parsing as general
rules for parsing compound microformats.
How does that sound?
Tantek
More information about the microformats-dev
mailing list