[uf-discuss] Re: DOM scripting as an alternative to include-pattern? (possible FAQ)

Tantek Ç elik tantek at cs.stanford.edu
Mon Jun 5 11:13:55 PDT 2006

On 6/5/06 10:27 AM, "Michael Leikam" <leikam at yahoo.com> wrote:

> Tantek,
> Thanks.  You're obviously more familiar with designing data
> formats than I am, but could you (or anybody else really)
> explain a little

I'm guessing we're going to want to add the rest of this to the FAQ.

> more about the differences you see between
> supporting DOM manipulation during the parsing, as I've
> suggested, and supporting include-patterns?

They are completely different.  How do you see them as being the same?

include-patterns are simply transclusion - not a manipulation at all.

> To me include-patterns seem like a subset of DOM

Would you also say that the <img> element is a subset of the DOM?

How about just the <object> element by itself which allows for
implementations to "fall back" to its contents if the referenced data is not

The biggest difference here is the difference between declarative and
procedural processing.

Just because procedural processing can mimic declarative processing doesn't
mean that declarative processing is procedural.

> and both
> seem less to do with the data format itself than the
> inherently procedural transformation from one format to
> another.

includes are not procedural.  includes are merely aggregation.

> What is the difference between defining a data
> format and defining what people do with that data format
> (i.e., what that data format is used for)?

Defining a data format defines a syntax, grammar, and semantics, as well as
often an abstract model of the data.

That's very different than trying to define all possible applications for
that format.

Nor does defining how to parse a format from the syntax using the grammar
into the abstract model imply that you are defining applications for that

> I do see the benefit of having an <object> within a
> microformatted block of content.  There isn't a process
> that needs to run in order for the syntax to reflect the
> included data.

That's a good summary.  You don't have to perform some Turing-complete
embedded computation in order to extract the semantics.

> It's also more scanable by human eyes than
> a block of javascript or xsl.

This is always a HUGE plus for microformats.

> But in order for the parser
> to generate the target format, you've defined this
> procedure:
> ---------
> if class is "include", grab the referenced node including
> descendants and replace the current node with the
> referenced one.
> ---------

Yes, that is a simple way of defining how to process object-includes.  The
key here is, that that is within the context of *parsing* the microformat.
Parsing is already a well defined process and this is simply adding another
detail to it.  

This is very different than adding say, a virtual machine that processes
arbitrary loops and conditionals.

> I guess at root I'm unclear about whether maxims regarding
> data formats also apply to data parsing?

What distinction do you see between data formats and data parsing?

A well defined data format includes enough syntax/grammar details that the
data parsing is 100% deterministic from the data format.

> The parallel I
> see is microformat:parsing::XML:DOM. You want to avoid
> procedural rules in the X(HT)ML, but the DOM exists to
> formalize them. Is that faulty?

Yes that is a faulty analogy.

Part of the problem is the ambiguity of what people mean when they use the
phrase "the DOM".

1. If by "the DOM" you merely mean the abstract node and attribute
structure, then yes, parsing microformats also gives you a "DOM" which you
can then further process however you see fit.

In fact, this is what DOM *literally* means: document object *model*.

2. If however by "the DOM" you mean a data abstraction + a set of predefined
methods/functions/procedures which you can apply to that abstraction, then
the answer is no.

Unfortunately most people seem to conflate these two definitions and/or
assume that #2 is the only definition because the W3C "DOM" specs all define
both a model and a set of APIs.

If you want to try distinguishing these to avoid confusion, perhaps call the
abstraction "DOM Model" (which I realize is redundant, but unfortunately the
emphasis is necessary due to the confusion that it seems most web
programmers have), and the methods/functions/procedures "DOM API".  Others
have tried to introduce the term "Infoset"[1] to mean the "model" since the
"model" has been subsumed to include APIs.  However, I have found that in
practice, the term "Infoset" has very low comprehensibility.

> The sort of markup I had in mind was something like this:
> ---------
> <div id="company">
> <div class="hcard">
>   <h1 class="fn org">Michael's Webby Widgets</h1>
>   <div class="adr">
>   <span class="locality">Los Angeles</span>
>   </div>
> </div>
> </div>
> <div class="hcard" onUFparseEvent="add_org_and_city()">
> <div class="fn">Michael Leikam</fn>
> <a class="email" href="mailto:me at foo.bar">
> </div>

The problem is, with this bit of code:


you just added:

* an event model
* a functional programming model
* a parameter model (clearly those parentheses are there for a reason)

This very much falls in the category of using a steamroller to swat a fly.
Not only is it overkill, but in practice, extremely cumbersome to do so.

> I don't really want to include the entire div#company since
> it includes fields that already exist in my personal block,
> e.g., "fn".

Actually, you can do that, since the parsing rules for singleton properties
are to merely ignore latter instances.  Thus just make sure you include the
property declarations in your personal block first and those will be found

> Adding an ID to span.locality, which I think
> is how include-pattern wants to handle this, isn't
> appealing because I'd want to use a generic hcard generator
> for any contact information.

Why?  Seems like you are trying to do some pretty fancy interconnected hCard
stuff - to then add the unnecessary constraint of using "a generic hcard
generator" makes no sense.  It is trivial to add ID attributes as needed,
even after the fact.

A better approach to take would be to point (URL) to a real world example
you are trying to markup which actually *has* these issues, so we can all
take a look and figure out what to do with it.

Abstract examples don't merit much discussion around here.

> I also see the benefit of advocating limited solutions for
> real problems.  That's a very good goal.

It's not just a goal for microformats, it is a core principle.

We've rejected far more features that were far less abstract.

We not only limit to solutions for real problems.  We limit to the 80% case
and punt on the 20% case (which makes people who want those 20% cases upset,
but the alternative is to double or triple the amount of time to try to get
things done).

> I really wasn't
> expecting the community here to say "oh, ok, we'll add DOM
> support tomorrow."

DOM Model definition is a reasonable request, and is essentially what the
"parsing" documentation defines.

DOM API - don't expect to ever see anything in that regard.

There should be no additional DOM API for microformats above and beyond the
context where they are already used, and that is outside the definition of
microformats and their processing model (e.g. microformats in HTML can be
manipulated with HTML DOM APIs by HTML user agents that choose to support

> There's clearly more in terms of
> thought and use-cases that would need to go into deciding
> whether it's actually a good solution to real problems we
> face in marking up our content.  But from the replies I've
> gotten, it sounds like this is the beginning of a
> discussion and not something that is already ongoing.

It is neither ongoing, nor IMHO the beginning of a discussion.  We're not
adding anything procedural to microformats.

They (procedural additions) are unnecessary, and introduce SO MANY problems
in the processing model (e.g. viruses, security problems etc.) as to be not
worth it.

For more on that, see Tim Berner-Lee's "What not to email" here:


and note all the examples he lists.  As I said, this is a well known problem
in data format design.  Note that when Tim says you can send him HTML, he
knows he can turn off javascript (or run a reader that doesn't bother to
implement it at all), and he will still be able to view the content just
fine (assuming it is properly authored with XHTML+CSS).



[1] http://www.w3.org/TR/xml-infoset/

More information about the microformats-discuss mailing list