[microformats-discuss] Late to the party!

brian suda brian.suda at gmail.com
Thu Jul 14 16:47:27 PDT 2005


Lucas Gonze wrote:

>Question -- how hard was it to write X2V?  
>
It is just several several template calls, so each is pretty independant
(similar to function calls). It has evolved over time, so loads of hours
have been to put into it, but it is never difficult (with the exception
of maths in handling dates)

>You mentioned collisions
>between names used in multiple profiles, but only in the context of a
>generic XMDP->parser-generator program.  Does that mean they were
>manageable otherwise?  Does your code handle any valid XHTML for a
>given h* microformat, or did you have to limit it to specific XHTML
>serializations?
>  
>
This is not really a problem, it is just the similicity of  XMDP. XMDP
is NOT a schema, so it does not define certain things like references
(e.g. transitive in XFN) or other things like 'X' is a child of 'Y'.  So
with hCard/hCal when things must be children of others it is difficult
to represent this in XMDP as machine readable.

This is a hack of an XMDP parser/validator (it is not pretty) but it
will fetch a URL and attempt to apply another XSLT file and output some
very simple output describing what is found.

http://suda.co.uk/projects/XMDP/

One of the potential problems with a generic validator for XMDP file
would be something like the following:
- XMDP defines a class value called "work". The english proses tells you
that this must be a child of an element called class="tel". None of this
is machine readable, so all i can determine from the XMDP is that
class="work" is a valid property in the hCard XMDP profile. Now, on a
given page, you could have class="work" twice, once as a subproperty of
class="tel" (which is the one you actually mean as work phone) and a
second time that describes a footer div or some other CSS style instead
of a microformat. This generic XMDP validator will find BOTH and say
BOTH are actually valid (it is not likely that people would use
microformat terms as CSS styles, but the microformat terms are NOT
reserved words, so there is potential for this).
- A similar situation occurs with class="url". This is defined in both
hCard and hCal, so if you pass a page to the validator and say 'validate
this page against, XFN, hCard, hCal' it will pull out class="url" but it
can not determine (because the XMDP can enforce a structure, only named
values) that the URL is a sub-element of the class='vevent' or 
sub-element  of class="vcard". So when validating with the hCal profile,
it will find the class='url' inside the hCard profile because all the
validator can do is look for "class='url'", it can't (easy) determine
(with advanced knowledge) that this class="url" is ment for hCal and NOT
ment for hCard. (which url is semantically the same in both profiles, it
would be finding them in different places outside of the properties)

The idea of a univeral validator is that it would parse the <head
profile="..."> and fetch those urls, use an XSLT file to convert the
XMDP to another XSLT file and then compare your HTML against the new
XSLT file that was generated from the XMDP. (that's what it currnetly
does, except the XSLT files are cached) That way when  new XMDP files
created for specific purposes they don't need to be regisitered with the
validator. Instead the validator could fetch, parse, and validate
against all of that dynamically on the fly. This is where you would NOT
have any advanced knowledge of the english prose in the descriptions and
would not be able to correct the things stated above.

>What I'm interested in is the question from another thread of how
>plausible parsers are in real life.  My impression right now is that
>takes a really good developer to pull it off, but it is possible to
>do.
>  
>
It is more a case of a solid foundation for getting machine readable
data. XMDP is first ment as human readable, not machines. Other things
like the XML schema (which is HUGE) but it is very well defined so
machine know what to do. CSS parsers are the same, they are a defined
standard, not an ad hoc design like microformats can be.

I hope this answers so of your questions, if you need things explained
further feel free to ask and i'll see what i can do.

-brian


More information about the microformats-discuss mailing list