[uf-new] Microformats parsing, in general (was: hAudio final draft)

Mon Jun 18 14:27:38 PDT 2007

On 6/18/07, Tantek Çelik <tantek at cs.stanford.edu> wrote:
> This is likely to be precisely why we may need to solve this problem by
> continuing the mfo discussion.

--- Part of the reason the MSO discussion died is because it didn´t
actually solve anything.

> If you look at the current known alternatives:
>
> 1. require parsers to update whenever new nestable microformats are
> introduced, and precisely define rules for handling known/common nesting
> cases (to at a minimum avoid wasting time on straw-man arguments).

--- i do NOT like this alternative because it makes the assumption
that you WANT the data to be two different things. For instance, if i
have a URL as a child of hCard. Then the common parsing rules might
say, when that hCard is a location of an hCalendar ignore the URL, but
what happens when i WANT that URL to be part of the hCalendar - this
leads to incorrect assumptions. I would rather let the PUBLISHER be as
explicit as they want or not, rather than parsers attempt to
interprent their intents.

> 2. add a new class name to indicate a encapsulation scope (e.g. "mfo") when
> embedding
>  - = one new class name, only in cases where nesting occurs.

--- The problem with MSO is something like the following:

- hCalendar
-- location (MSO)
--- hcard
---- URL

the URL is ignored for the hCalendar, but then the LOCATION is blank
too because MSO says NOT to take any data. So we move the MSO inside
the hCard

- hCalendar
-- location
--- hcard
---- URL (MSO)

Now you get some data for the location, but now URL is ignored for
BOTH hCal and hCard.

>From what i remember MSO didn´t actually solve anything, it just
created more problems. This is why IMHO it was never persued any
further than just a thought.

> 3. replicate/prefix property class names for each microformat e.g. audio-fn
>  - = numerous new class names
>
> It is pretty clear that #3 is the worst from a complexity (most new class
> names) that would affect the most people (publishers) point of view.  So we
> should seek to avoid #3 since that violates the principles the most.

--- each microformat can also defined its parsing rules. For instance,
hAtom only looks for rel-tag NOT inside an hentry. there is no reason
that a media format can´t define that an FN can ONLY be taken when it
is NOT a child of an hCard, but then this limits the way people can
publish.

> #2 adds some incremental authoring complexity in some cases.

--- i am against MSO, it is un-needed, adds complexity and doesn´t
actually solve much. It attempts to add scoping, which has never been
a problem in the past. It also help focus microformats from attempting
boil the oceans.

> #1 is something that we can probably still do today since both the number of
> microformats is small (a good reason to keep the overall number small), and
> the number of parser implementations is small and parser implementers are
> both involved in the community and able to update their code quite quickly (
> cc'ing microformats-dev accordingly).
>
>
> Therefore it is reasonable IMHO to:
>
> Pursue #1 in the short term until we have solved #2 in the medium term.

--- i think this can be fixed without either of these options. If we
spend the time actually examining real data in the wild, i think we
will find that many of these theoretical issues will either disappear,
or we will have some exact examples that we can further explore and
encode the rules in the format itself rather than trying to work with
any of the above options... i am against #2 outright.

#1 doesn´t sit well with me because it causes an exponential code
growth and potential to introduce more and more bugs.

Each format simply represents data, which can be divisible from each
other. If there are hCards on the page, that is simply people data -
no matter what it is nested in - i should be able to extract them
independently of their scope. Introducing constraints i think makes
things more complex, so i think this should be avoided.

-brian

-- 
brian suda
http://suda.co.uk