[microformats-discuss] URIs please!

Thu Jul 14 10:14:53 PDT 2005

On Jul 14, 2005, at 3:14 AM, Danny Ayers wrote:
> A little plea.
>
> I just noticed that in the example for Bud's XFolk [1] that the blocks
> containing microformats are demarcated using class attributes, i.e.
>
> <div class="xfolkentry">
>
> As it stands, the only way of discovering whether such a document
> contains microformat data is to scrape it and look for that value.
> Consider the following scenario:
>
> You have a subscription to certain del.icio.us RSS feeds; checking the
> linked pages to see if they contain microcontent markup, if they do
> extracting the data and putting it into a queryable store. All
> automatic.
>
> Now you are likely to get a lot more docs which don't contain
> microformat data than those which do. Ok, so "xfolkentry" is unlikely
> to be misinterpreted. But what if the documents contain e.g.
>
> <div class="name">
>
> Is this microformat data? Which microformat?

Depends on the context. For example, we have "description" in several  
microformats, which can be disambiguated by its context. For example,  
this:

<div class="hreview">
<p class="description>
..
</p>
</div>

and this:

<div class="vcalendar>
<span class="description">...</span>
</div>

Are not ambiguous.

> There is a mechanism for recognising microformats in the docs - use a
> profile, e.g.
>
> <head profile="http://example.org/some/microformat/schema">

Yes, and its likely that this will be expanded to at least:

<link rel="profile" href="..." />

if not also:

<a rel="profile" href="...">...</a>

> The microformats docs do cover a simple schema language XDMP, but how
> the schema is done in this context is less significant than it having
> a URI. Having it in the <head> is good too, it isn't necessary to
> parse the whole doc looking for any "known" attributes. It also makes
> it possible to offer support for microformats unknown to the system at
> design time (for RDF-based apps this is straightforward using GRDDL).
>
> Sure, there's an advantage in having well-known semantic markup terms,
> the vocabularies defined in microformats. But for automatic discovery
> and processing it's also hugely beneficial to be able to recognise
> microformat data unambiguously. The doc can be processed in a way
> appropriate for the microformat. The profile URI provides this
> disambiguation and allows deterministic processing. This doesn't in
> any way compromise the "simplicity" aim of microformats, in fact the
> net effect is overall simplification. Hunting for arbitrary strings in
> attributes is hard work!

I'm not sure what point you're trying to make here. I don't think  
anyone's arguing against profile urls.

> I understand there's ongoing discussion about declaring that a
> microformat is in use in doc fragments (where the <head> is
> unavailable). I don't know whether the use of an <a> hyperlink is the
> best mechanism or not (a possible alternative might be to use a URI
> for the outermost microformat term, e.g.  <div
> class="http://example.org/some/microformat/schema/xfolkentry">).

How is this different from

<div class="xfolkentry">

?

> But however it's done, identification of the microformat used within
> the doc by means of a URI (the GUID of the Web) is essential IMHO to
> make the difference between making quality, globally unambiguous data
> available and something barely less fragile than screenscraping as it
> stands.

I don't think anyone is disagreeing with you.

-ryan