Disambiguation [was RE: "aid" microformats? (was Re: [uf-discuss]ISBN mark-up)]

Mon May 1 00:29:30 PDT 2006

> From: Scott Reynen Saturday, April 29, 2006 4:22 PM
> To: Microformats Discuss
> 
> On Apr 29, 2006, at 5:32 PM, Benjamin Carlyle wrote:
> > Microformat terms act like profiles in identifying how to process the
> > content, so what else would using a profile add:
> > 1) The ability to skip parsing of a html document (or parts thereof)
> > becase we don't see the profile elements we recognise.
[relocated]
> >
> > I think that (1) is based on a false premise. You have to at least
> > start
> > parsing the html document in order to know which profiles are used.
> > Chances are that profiles will be frequently missing or incorrect
> > given
> > the current tooling situation. I think parsers will look for
> > microformats they know about no matter what the profiles attribute
> > says.
> 
> Agreed.

The presumption here is that processing is cheap and undirected.  If on the
other hand, you had a vast context in which to process and had pointers into
that context for where to start, you could significantly increase the speed
of that processing with a single microformat specifier ala DOCTYPE. 

In cases where user interaction depends on the application's understanding
of the semantics underlying the HTML, this could be significant usability
factor.  See Kaboodle[1] or Backpack[2] or Scrapbook[3] for examples where
realtime, directed parsing is useful.

[A] http://www.kaboodle.com
[B] http://www.backpackit.com 
[C] http://amb.vis.ne.jp/mozilla/scrapbook/ 

Basically, all of these could be seen as variants on Live Clipboard.

It is about scale.

If there are only a handful of Microformats and they are all well-known,
(and we have effectively hijacked the "class" default namespace), then the
processing should be manageable.

But if there are thousands or tens of thousands of Microformats--and yes, I
know this presumption is at odds with some of the expectations behind a
socially moderated namespace--in that scenario, it is easy to calculate the
difference of running a single attribute check for "microformat" instead of
checking against the entire Microformats space.  

This was what I meant when I asked "How do Microformats scale?"

Ben's initial response references two pieces suggesting that scaling to many
different Microformats is a bad thing:
[1] http://soundadvice.id.au/blog/2006/04/11#namespaces 
[2] http://www.mnot.net/blog/2006/04/07/extensibility 

Both contain good arguments.  However, the closing statement in [2] bears
repeating as it seems to me, implicitly that each microformat is a namespace
of sorts, e.g., hCard and hCalendar define namespaces of structured
semantics.

==
In this view, making the points of extensibility into scarce,
community-managed resources - e.g., as media types do - is a good thing. It
has positive political and social effects; it forces (or at least inclines)
people to co-operate, whether they're a multi-billion dollar behemoth, or a
sole engineer who wants his fifteen minutes of fame. 

Namespaces aren't completely evil, of course. If you want to explicitly
allow anybody to walk up and add data to your format, they're a fine way to
make sure there's no ambiguity, and give nice leverage for versioning, and
perhaps for separating different concerns. I think this will tend to make
sense for formats where truly disconnected, uncoordinated data is collected,
like RDF. 

However, they don't automatically make sense for situations where you need
tight co-ordination between different entities (e.g., things we tend to call
"protocols"); allowing anybody to rock up and extend a protocol with no
overhead is inviting interoperability problems and abuse.
==

I don't believe we are in the latter situation where we need tight
coordination as in a protocol.  Instead, what we need is a simple way for
human authors to say "This is what I mean".  Hence, there are rich benefits
to be created by disambiguation... because we /are/ talking about "truly
disconnected, uncoordinated data".  

The soccer league use case mentioned in the podcast at Microformats.org [D]
is a great example of how disconnected and uncoordinated our data and
authorship is.

[D] http://blogs.msdn.com/alexbarn/archive/2006/03/31/566361.aspx 

> > 2) To provide additional disambiguation: To tell a parser which vcard
> > specification or version to use.
> > 3) To identify the fact that some microformats are in use, ie use
> > "http://microformats.org/" instead of a profile for a specific
> > microformat.
> 
> > (2) and (3) also seem like a bad ideas. They would be technical
> > measures
> > to allow the established microformat community base to splinter. While
> > we all live within one namespace we are force to interact with each
> > other to resolve conflict. Outside of that space confrontation is
> > avoided and we end up with "mymicroformats:vcard" and
> > "yourmicroformats:vcard" class names. Publishers would be forced to
> > choose between the two.
> 
> I don't really understand 3.  I don't think 2 is a bad idea; I just
> don't think it's necessary.  It's not really "mymicroformats:vcard"
> and "yourmicroformats:vcard" we might see on the web.  It's "gmpg.org/
> vcard" (or even "w3.org/vcard") and "mydomain.com/vcard".  One is
> clearly more authoritative than the other (which is so far entirely
> hypothetical), so I don't think this is a worthwhile concern.  I
> don't think ambiguity is a worthwhile concern either, but I do think
> it will be less trouble to create profiles to satisfy those who have
> this concern than to convince them that it's not worth worrying about.

There is value in forging a tight class of well understood, easily human
authored, semantic tags. However, Allowing rich variation on the existing
classes doesn't "split" the community--the community is the social network,
not the semantic space. Instead, it allows exploration and differentiation,
which ultimately can be incorporated back into the foundation classes. More
importantly, it allows user-driven innovation.

I think it is hubris to expect that the first adopted version of a
microformat is the orthodox way to do it and that variations are heresy.
Isn't it to the benefit of the community that we have a formal mechanism to
experiment and expand, allowing our semantic understanding to organically
grow?  If our mantra includes basing our developments on real-world
examples, then how does the spec evolve if we don't have real-world examples
of derivative implementations?  Without variations, we risk stagnation.  The
fact that Netscape and Microsoft had a platform to experiment with HTML tags
gave them incredible authority--and when their extensions caused problems,
the community has largely been able to push back. (No need to get lost in
the standards-compliance rabbit-hole, but progress is a two-way street
between experimentation and community standards.)

I think the type of disambiguation I am talking about can be addressed with
a simple microformat="profile" attribute.  The profile URI can itself
provide any further disambiguation, such as the version of hCard or version
of the microformat standard.

<div class = "hCard" microformat =
"http://microformats.org/wiki/hcard-profile">

Thoughts?

-j

--
Joe Andrieu
joe at andrieu.net
+1 (805) 705-8651