validating microformats (was Re: [uf-discuss] Google Gdata new syndication protocol!)

Mark Pilgrim pilgrim at
Fri Apr 21 13:45:51 PDT 2006

On 4/21/06, Ben Ward <lists at> wrote:
> I have a small additional suggestion for 'validator' feedback, that
> concerning common errors in naming conventions: Such as the use of a
> 'middle-name' classname when 'additional-names' was intended. Also
> 'locality', 'region', 'postal-code', 'country-name' can be misentered
> as 'city', 'county', 'zip' or 'zip-code' and 'country' respectively.

Yeah, if "zip-code" is present in an "adr" but "postal-code" is
missing, that should definitely generate a warning.  The range of
warnings is potentially infinite, and like microformats themselves,
should evolve from documenting current practice.  Once we have a
validator, it can be used to collect real-world examples that can in
turn be analyzed to make the validator more useful (with appropriate
privacy policy notification, of course).

The process of developing a validator may also shake out ambiguities
in the spec itself.  For example, I didn't see anything in the spec
that said that "sort-string" can only appear 0 or 1 times per "n".  It
just occurred to me while looking through some examples that it
doesn't make any sense any other way.  That kind of thing doesn't
occur to most people unless they're really good at writing specs, or
they have previous experience writing validators, or both.  But it
turns out to be really important, because without any guidance,
consumers will end up accidentally making different choices and we'll
lose interop.  Some will use the first sort-string, some will use the
last sort-string, some converters will include all of them and end up
producing an invalid vCard (at least I hope that's an invalid vCard, I
haven't read RFC 2426 closely enough to find out)... which in turn may
trigger inconsistent behavior among vCard-enabled applications.  Lots
of luck debugging that one.

I can point to numerous examples of this sort of thing happening in
RSS.  There are *still* lingering questions about whether an RSS item
can have multiple enclosures, and different podcasting clients handle
the presence of multiple enclosures in different ways.  (iTunes only
downloads the first one listed; others only download the last one
listed; others download all of them.)  Why do you think Atom took so


More information about the microformats-discuss mailing list