[uf-new] Microformats parsing, in general (was: hAudio final
draft)
Tantek Ç elik
tantek at cs.stanford.edu
Mon Jun 18 15:18:53 PDT 2007
(moving largely parsing discussion to microformats-dev, microformats-new
bcc'd)
On 6/18/07 2:27 PM, "Brian Suda" <brian.suda at gmail.com> wrote:
> On 6/18/07, Tantek Çelik <tantek at cs.stanford.edu> wrote:
>> This is likely to be precisely why we may need to solve this problem by
>> continuing the mfo discussion.
>
> --- Part of the reason the MSO discussion died is because it didn´t
MFO
> actually solve anything.
No it helps abstract when to stop looking into a node for property values.
Full stop. Nothing more, nothing less.
>> If you look at the current known alternatives:
>>
>> 1. require parsers to update whenever new nestable microformats are
>> introduced, and precisely define rules for handling known/common nesting
>> cases (to at a minimum avoid wasting time on straw-man arguments).
>
> --- i do NOT like this alternative because it makes the assumption
> that you WANT the data to be two different things. For instance, if i
> have a URL as a child of hCard. Then the common parsing rules might
> say, when that hCard is a location of an hCalendar ignore the URL, but
> what happens when i WANT that URL to be part of the hCalendar - this
> leads to incorrect assumptions.
That case "when you want the URL (of the hCard) to be part of the hCalendar"
- I assert is *way* less than 20%. If you think this is a real issue, let's
start with at least one concrete example you have seen where this is true.
> I would rather let the PUBLISHER be as
> explicit as they want or not, rather than parsers attempt to
> interpret their intents.
I agree with that methodological statement, yet perhaps we are coming to two
different conclusions.
>> 2. add a new class name to indicate a encapsulation scope (e.g. "mfo") when
>> embedding
>> - = one new class name, only in cases where nesting occurs.
>
> --- The problem with MSO is something like the following:
MFO
> - hCalendar
> -- location (MSO)
> --- hcard
> ---- URL
<snip>
This is a false strawman example. MFO is only for root microformat class
names, not for arbitrary properties.
e.g. class="vcard mfo", NOT class="location mfo".
second example snipped for same false assumption.
> From what i remember MSO didn´t actually solve anything, it just
> created more problems. This is why IMHO it was never persued any
> further than just a thought.
No it wasn't pursued due to lack of time, and lower priority than other
pursuits.
>> 3. replicate/prefix property class names for each microformat e.g. audio-fn
>> - = numerous new class names
>>
>> It is pretty clear that #3 is the worst from a complexity (most new class
>> names) that would affect the most people (publishers) point of view. So we
>> should seek to avoid #3 since that violates the principles the most.
>
> --- each microformat can also defined its parsing rules. For instance,
> hAtom only looks for rel-tag NOT inside an hentry. there is no reason
> that a media format can´t define that an FN can ONLY be taken when it
> is NOT a child of an hCard, but then this limits the way people can
> publish.
These specific parsing rules are already part of the #1 option I mentioned.
>> #2 adds some incremental authoring complexity in some cases.
>
> --- i am against MSO, it is un-needed, adds complexity and doesn´t
> actually solve much.
Based on the misspelling and false strawman examples, I think you may be
against something that is not being proposed.
>> #1 is something that we can probably still do today since both the number of
>> microformats is small (a good reason to keep the overall number small), and
>> the number of parser implementations is small and parser implementers are
>> both involved in the community and able to update their code quite quickly (
>> cc'ing microformats-dev accordingly).
>>
>>
>> Therefore it is reasonable IMHO to:
>>
>> Pursue #1 in the short term until we have solved #2 in the medium term.
>
> --- i think this can be fixed without either of these options. If we
> spend the time actually examining real data in the wild, i think we
> will find that many of these theoretical issues will either disappear,
This is a good approach of course.
> or we will have some exact examples that we can further explore and
> encode the rules in the format itself rather than trying to work with
> any of the above options...
Hence why I prefer pursuing #1 first as well.
> #1 doesn´t sit well with me because it causes an exponential code
> growth and potential to introduce more and more bugs.
Not necessarily. I don't believe the assertion of required exponential code
growth. I'm optimistic that patterns that emerge will solve a lot of this.
> Each format simply represents data, which can be divisible from each
> other. If there are hCards on the page, that is simply people data -
> no matter what it is nested in - i should be able to extract them
> independently of their scope.
Agreed.
> Introducing constraints i think makes
> things more complex, so i think this should be avoided.
In general yes we are trying to minimize complexity.
Sometimes it is difficult to avoid adding complexity *somewhere* and thus
the key point in this discussion is where to put necessary added complexity.
Thanks,
Tantek
More information about the microformats-new
mailing list