[uf-dev] Parsing By Element

Tantek Ç elik tantek at cs.stanford.edu
Thu Sep 13 11:57:58 PDT 2007


On 9/13/07 6:54 AM, "Brian Suda" <brian.suda at gmail.com> wrote:

> Tantek and I were talking about how we should construct a list of
> elements and explain how they are being parsed. For example,
> 
> <a href="http://example.org" class="fn url">John Doe</a>
> 
> We know that FN becomes "John Doe" and URL becomes http://example.org
> but there is very little documentation (atleast in one place) about
> how and when these rules are invoked.

Actually, there is quite a bit of documentation about this, and it is in
*only* one place currently:

 http://microformats.org/wiki/hcard-parsing

> I created a parsing page on the wiki (feel free to move it as needed)
> http://microformats.org/wiki/parsing
>
> It takes the W3C list of HTML elements and begins to map how and where
> to extract the values for various microformats properties.

I saw that, I'm not sure that a raw element list is the right way to start
that.

I've been trying to complete the *semantic* element and attribute lists as
well as group them into logical sets for mnemonic purposes here:

 http://microformats.org/wiki/semantic-xhtml

>From that, the next step is an audit of hcard-parsing to see if I'm missing
any special element handling (like <input> for example) and derive parsing
rules per element semantics accordingly and finish writing them up here:

<http://microformats.org/wiki/hcard-brainstorming#Additional_Semantic_HTML_h
andling>

Then let's take a look at the open source implementations (X2V, hKit,
Operator) and determine if it is fairly straightforward to add any
additional element-specific semantic handling - I expect that implementation
updates should be fairly trivial for a few special cases.

Simultaneously, test cases which exercise the new parsing cases will help as
well.

Once we've gotten all that working for hCard, then I'll draft up
/wiki/hcalendar-parsing accordingly as well.

With the practical experience of hCard and hCalendar parsing, I'll
extract/abstract common bits and draft /wiki/compound-parsing as general
rules for parsing compound microformats.

How does that sound?

Tantek



More information about the microformats-dev mailing list