[microformats-discuss] funness -> validator

Bud Gibson bud at thecommunityengine.com
Tue Aug 16 13:40:15 PDT 2005

On Aug 16, 2005, at 15:12, Tantek Çelik wrote:

> The longer answer is yes, that we have XMDP which at least defines the
> vocabulary of a microformat, and the remaining constraints are  
> defined in the specifications.  As Brian noted, he's working on a  
> generic
> XMDP validator, but any validator for a particular format will need  
> to have
> hand coded rules for the specific format (just like *every* other  
> format
> validator out there, e.g. the HTML, CSS, RSS, Atom validators etc.).

I've actually created a protean validator for xFolk using javascript  
and find it quite useful.  As Tantek observes, you have to hand code  
rules as there is no schema language (not a knock, just an  
observation).  I think the need to hand code may actually be a  
benefit.  It forces you into the realm of real-world coding that  
implementers will face.

My "validator" goes through and colors patches identified as xFolk  
entries and then their component parts, a different color for each  
part.  What I have found useful while using the validator is not so  
much that it identifies "valid" xFolk as it shows me how my  
particular rendition of xfolk on a page will be perceived by  
parsers.  To say the least, that is eye-opening, and I would suggest  
developing such a validator as a general strategy for people writing  
microformats.  It's not hard, and it is a check on how coherent your  
specification really is.

It is impossible to overstate how useful a simple visualization of  
the microformat and its component parts in the wild can be,  
particularly when you have user-generated data swimming into the mix.

One of the things I have found in developing things for xFolk (most  
of which is not currently public) is that DOM-based methods work  
well.  There are three in order of support:

1.  CSS-selectors:  not well supported in *programming* tools with  
the exception of the behavior.js library.

2.  XPath:  Good server-side support, but problematic with pages that  
are not well-formed xml.  Good support in Firefox for HTML even when  
not well formed.

3.  DOM level 1:  Great support in browsers.  Also nice because  
javascript allows you to mix in regular expressions in your  
selectors.  Right now, I am focusing here.

My final observation in this observation omnibus is that I have come  
to the conclusion that, as much as possible, you should attempt to  
preserve the tree structure nature of microformats in harvesting,  
storing, and republishing them.  Some programmers may find value in  
deserializing microformatted content into some sort of data structure  
and then reserializing on output, but that just seems to add  
complexity to me and may run counter to the extremely flexible nature  
of these things.  This last is just an off-the-cuff observation.  YMMV.


More information about the microformats-discuss mailing list