Disambiguation [was RE: "aid" microformats? (was Re: [uf-discuss]ISBN mark-up)]

Mon May 1 05:47:49 PDT 2006

On May 1, 2006, at 2:29 AM, Joe Andrieu wrote:

>>> You have to at least start parsing the html document in order to  
>>> know which profiles are used.
>>
>> Agreed.
>
> The presumption here is that processing is cheap and undirected.

There's no way to download only the DOCTYPE or the <head> of a  
document, and processing is cheaper than bandwidth.  Once you've  
already downloaded a whole document, you might as well parse it all  
because the <head> might be wrong about what's in the <body>.

> See Kaboodle[1] or Backpack[2] or Scrapbook[3] for examples where
> realtime, directed parsing is useful.
>
> [A] http://www.kaboodle.com
> [B] http://www.backpackit.com
> [C] http://amb.vis.ne.jp/mozilla/scrapbook/
>
> Basically, all of these could be seen as variants on Live Clipboard.

Right, but these all work client-side, where the document is already  
completely loaded.  Many microformat parsers work server-side.

> If there are only a handful of Microformats and they are all well- 
> known,
> (and we have effectively hijacked the "class" default namespace),  
> then the
> processing should be manageable.

It is manageable.  It's just not worth doing because:

1) the whole document is already downloaded, which is the largest burden
2) <head>s lie.

> But if there are thousands or tens of thousands of Microformats-- 
> and yes, I
> know this presumption is at odds with some of the expectations  
> behind a
> socially moderated namespace--in that scenario, it is easy to  
> calculate the
> difference of running a single attribute check for "microformat"  
> instead of
> checking against the entire Microformats space.
>
> This was what I meant when I asked "How do Microformats scale?"

Microformats scale by re-use.  Thousands or tens of thousands of  
microformats is an anti-goal.

> I don't believe we are in the latter situation where we need tight
> coordination as in a protocol.

We need tight coordination as in a dictionary.  A formal definition  
of a shared lexicon is what allows us to communicate with new  
symbols.  You can use whatever class names you want, but if I don't  
know what they mean, I can't parse them, and a profile doesn't tell  
me what they mean, it just tells me whether they follow a certain  
syntactic structure.  Profiles are like a grammar check, but we still  
need a dictionary.

> Instead, what we need is a simple way for
> human authors to say "This is what I mean".

Profiles don't do that.  No technology does that.  That's an  
incredibly complex problem that no one has yet solved.  When they do,  
we'll have usable machine translation, artificial intelligence, and  
microformats will be the least of our worries.

> There is value in forging a tight class of well understood, easily  
> human
> authored, semantic tags. However, Allowing rich variation on the  
> existing
> classes doesn't "split" the community--the community is the social  
> network,
> not the semantic space.

In practice, social networks require shared understanding of what  
things mean.  The lack of this shared understanding leads to civil  
war in the real world, and unused specs in the tech world.

> Instead, it allows exploration and differentiation,
> which ultimately can be incorporated back into the foundation  
> classes. More
> importantly, it allows user-driven innovation.

You can already explore and develop your own specs.  But if you want  
someone else to understand them, you have to explain to them what the  
spec means in clear human language.  Machines don't understand.   
People understand.  Clear human language is easier to accomplish in  
community than in isolation.

> I think it is hubris to expect that the first adopted version of a
> microformat is the orthodox way to do it and that variations are  
> heresy.

It would be if microformats, or dictionaries, did much more than  
document and formalize existing use.  Do you find hubris in  
dictionaries as well?  Who is this Webster to tell us what "dog"  
means?  He's someone who documented how a lot of people use the word  
"dog" and wrote it down in a dictionary, just like we're documenting  
how a bunch of people mark up citations, and writing it down in a wiki.

> If our mantra includes basing our developments on real-world
> examples, then how does the spec evolve if we don't have real-world  
> examples
> of derivative implementations?

We have a web full of real-world data publishing examples.

> Without variations, we risk stagnation.

No one is preventing anyone from using whatever class names they  
want.  No one is preventing anyone from telling others what their  
class names mean.  But no one has invented a technology to automate  
shared meaning.  We can't use it because it doesn't exist.

> I think the type of disambiguation I am talking about can be  
> addressed with
> a simple microformat="profile" attribute.

Have you looked at profile URIs?

http://microformats.org/wiki/profile-uris

That accomplishes exactly what your microformat="profile" would,  
except it's valid XHTML.  But neither accomplish shared meaning,  
which despite great effort, is still a human problem that requires  
human solutions.

Peace,
Scott