[microformats-discuss] URIs please!
Danny Ayers
danny.ayers at gmail.com
Thu Jul 14 03:14:18 PDT 2005
A little plea.
I just noticed that in the example for Bud's XFolk [1] that the blocks
containing microformats are demarcated using class attributes, i.e.
<div class="xfolkentry">
As it stands, the only way of discovering whether such a document
contains microformat data is to scrape it and look for that value.
Consider the following scenario:
You have a subscription to certain del.icio.us RSS feeds; checking the
linked pages to see if they contain microcontent markup, if they do
extracting the data and putting it into a queryable store. All
automatic.
Now you are likely to get a lot more docs which don't contain
microformat data than those which do. Ok, so "xfolkentry" is unlikely
to be misinterpreted. But what if the documents contain e.g.
<div class="name">
Is this microformat data? Which microformat?
There is a mechanism for recognising microformats in the docs - use a
profile, e.g.
<head profile="http://example.org/some/microformat/schema">
The microformats docs do cover a simple schema language XDMP, but how
the schema is done in this context is less significant than it having
a URI. Having it in the <head> is good too, it isn't necessary to
parse the whole doc looking for any "known" attributes. It also makes
it possible to offer support for microformats unknown to the system at
design time (for RDF-based apps this is straightforward using GRDDL).
Sure, there's an advantage in having well-known semantic markup terms,
the vocabularies defined in microformats. But for automatic discovery
and processing it's also hugely beneficial to be able to recognise
microformat data unambiguously. The doc can be processed in a way
appropriate for the microformat. The profile URI provides this
disambiguation and allows deterministic processing. This doesn't in
any way compromise the "simplicity" aim of microformats, in fact the
net effect is overall simplification. Hunting for arbitrary strings in
attributes is hard work!
I understand there's ongoing discussion about declaring that a
microformat is in use in doc fragments (where the <head> is
unavailable). I don't know whether the use of an <a> hyperlink is the
best mechanism or not (a possible alternative might be to use a URI
for the outermost microformat term, e.g. <div
class="http://example.org/some/microformat/schema/xfolkentry">).
But however it's done, identification of the microformat used within
the doc by means of a URI (the GUID of the Web) is essential IMHO to
make the difference between making quality, globally unambiguous data
available and something barely less fragile than screenscraping as it
stands.
Cheers,
Danny.
[1] http://microformats.org/wiki/xfolk
--
http://dannyayers.com
More information about the microformats-discuss
mailing list