[uf-dev] Parsing By Element

Fri Sep 14 03:29:34 PDT 2007

On 9/13/07, Tantek Çelik <tantek at cs.stanford.edu> wrote:
> > It takes the W3C list of HTML elements and begins to map how and where
> > to extract the values for various microformats properties.
>
> I saw that, I'm not sure that a raw element list is the right way to start
> that.

--- my understanding, was that we wanted a full list of elements and
their parsing rules. If we want to re-order or group them my semantics
that is fine. It is a wiki, so i'll let someone else take the existing
data and re-order/remove/tweak it as needed.

> I've been trying to complete the *semantic* element and attribute lists as
> well as group them into logical sets for mnemonic purposes here:
>
>  http://microformats.org/wiki/semantic-xhtml

--- last time i looked at that page it was a list of a handful of
elements. It seems much better and grouped now. How would you suggest
adding parsing information to that list (or do you?)

> >From that, the next step is an audit of hcard-parsing to see if I'm missing
> any special element handling (like <input> for example) and derive parsing
> rules per element semantics accordingly and finish writing them up here:

--- when i created the list of element, there are several that do not
seem to be covered. FRAME, SCRIPT, APPLET, CITE, INS/DEL and Q (they
both have a cite attribute), (some of those are easy answers) we only
have a brief description of TABLE semantics, (Tables also have a
SUMMARY attribute and the whole AXIS/HEADER/ID stuff). There was also
rules discussed about what it means if class="category" was on an
OL/UL would that be one category per LI or is that a single string of
the combined LI values?

<ol class="category">
  <li>foo</li>
  <li>bar</li>
</ol>

is that:
CATEGORIES:foobar
or
CATEGORIES:foo,bar

would the same apply to other properties such as TEL? or only plural
properties we sigularized? or none. I couldn't find a reference but i
thought we did decide on something, so we should document our decision

hCard page also mentions this:
http://microformats.org/wiki/hcard#Tags_as_Categories
using rel-tag with categories, then the parsing is different, this
isn´t mentioned on the hcard-parsing page

> With the practical experience of hCard and hCalendar parsing, I'll
> extract/abstract common bits and draft /wiki/compound-parsing as general
> rules for parsing compound microformats.
>
> How does that sound?

--- I agree it would be better to migrate this information to a
general "parsing" page, than continue to have a *-parsing for each
format. I´m not sure a hCalendar-parsing page is needed. There is
plenty of common overlap, so i would prefer this approach of a generic
page, then any specific rules be added to the *-parsing pages on a
format-by-format needed basis.

-brian

-- 
brian suda
http://suda.co.uk