validating microformats (was Re: [uf-discuss] Google Gdata new syndication protocol!)

Fri Apr 21 08:03:09 PDT 2006

On Thu, 2006-04-20 at 23:38 +0100, Nick Swan wrote:
> I'm working on a tool for discovering and validating microformats.
...
> I could really do with a flow diagram or something like that of how to
> parse/validate microformats.
...
> On 4/20/06, Breton Slivka <zen at zenpsycho.com> wrote: 
>         norman walsh recently posted inn his blog about this very
>         issue
>         http://norman.walsh.name/2006/04/13/validatingMicroformats

Microformat validation seems like a hard problem to me, or at least a
low-value one. Here are the problems I see:

1) Microformats permit any underlying html structure to be used, so
there is nothing to validate there that the w3c validator doesn't
already do.
2) Microformats allow arbitrary extension though the use of custom html
classes provided by the document author. Unknown classes are still
valid, so they can't be declared as errors.
3) The only validation that is possible is to ensure all data that must
be present in a particular microformat is present. That also seems a
little lightweight to me, because most microformats are fairly
minimialist in their approach to what information must be provided. It's
human's first and machines second, so whatever you happen to publish is
probably enough to be marked up as a microformat.

So what does validation mean for a micrormat? I think the only criteria
for success that we can meaningfully apply is that the data we put into
the document came back out again through a machine-operated process. We
already have the machine operated processes for various microformats
(x2v, hAtom2Atom.xsl, etc), but a human must still be in the loop to
determine whether all of their data got through or not. Unfortunately,
that's another "by definition" problem. If the data isn't
machine-readable in the first place, a machine won't know it's missing.

So, what do we mean by microformat validation? I think x2v+human and
hAtom2Atom.xsl+human is the best we can hope for. We can try and do
heuristic validation ("this class name you used looks like one that
could mean something if it were written in a different way"), but the
heuristics would have to be bourne out of implementation experience with
"common errors" for particular microformats.

Comments?

Benjamin.