validating microformats (was Re: [uf-discuss] Google Gdata new syndication protocol!)

Fri Apr 21 08:34:11 PDT 2006

On Apr 21, 2006, at 10:03 AM, Benjamin Carlyle wrote:

> So what does validation mean for a micrormat? I think the only  
> criteria
> for success that we can meaningfully apply is that the data we put  
> into
> the document came back out again through a machine-operated  
> process. We
> already have the machine operated processes for various microformats
> (x2v, hAtom2Atom.xsl, etc), but a human must still be in the loop to
> determine whether all of their data got through or not. Unfortunately,
> that's another "by definition" problem. If the data isn't
> machine-readable in the first place, a machine won't know it's  
> missing.

I imagine a microformat validator would be relatively short on errors  
and long on warnings or "tips".  Each class could have a list of  
potential sub-classes, and when those don't turn up, I think a  
message like "Tip: vcards can have telephone numbers. Did you mean to  
include a telephone number? If so, you need to use the following  
syntax:"  In addition to catching actual oversights, such messages  
would encourage more complete descriptions, putting more  
microformatted data on the web.

On the other end, any node found with no recognizable class name  
could be checked against recognizable content patterns.  If there's  
an unmarked node within "tel" with a bunch of numbers, I'd like a  
validator to suggest that I might want to put class="value" around  
it, because it looks like it might be the value of my telephone number.

> We can try and do
> heuristic validation ("this class name you used looks like one that
> could mean something if it were written in a different way"), but the
> heuristics would have to be bourne out of implementation experience  
> with
> "common errors" for particular microformats.

I can't think of a better way to discover those common errors than a  
validator.  I think most of the formatting errors we see on this list  
could be recognized by a machine, which would save everyone time and  
make authors feel more sure about whether or not they are doing  
hwhatever correctly.

Peace,
Scott