[uf-discuss] Human and machine readable data format
Jeremy Keith
jeremy at adactio.com
Mon Jun 30 15:29:10 PDT 2008
Scott wrote:
> I think the problem may be clarified by actually writing those out
> in a sentence:
>
> I arrived at work 5 minutes ago.
> I arrived at work 14:00.
>
> The latter doesn't seem human-readable to me.
But it does to me. And that's kind of the crux of the issue. Defining
"human readable" is a lot harder than defining "machine
readable" (which is measurable). You're a human and I'm a human but we
disagree about what's readable.
It's a semantic debate. I don't mean that in a bad way. HTML is all
about semantics and nobody likes a semantic debate more than me (ah,
how I miss Dan's SimpleQuiz).
> There are a few cases where we are specifying content syntax for
> publishers, e.g. phone type in hCard. And these are all similarly
> problematic. I think we might get closer to solving these problems
> by considering them not in terms of whether or not humans could
> theoretically read them, but rather in terms of whether or not
> microformats should be requiring publishers to publish specific
> content.
I agree with you in theory but in practice, the logical conclusion is
to attempt natural language parsing which seems like a boiling-the-
ocean approach to me (given the sheer number of languages).
So we compromise. The issue of providing an alternative to the abbr/
datetime combo is bound to involve a compromise somehow. I would
rather rather that the compromise follow existing publishing behaviour.
No solution is going to be perfect. But separating dates and times
while reusing the value excerpting pattern seems like the least
problematic to me. It involves the least change to existing
conventions and I *think* it will involve minimal changes for parsers
(though that needs to be tested).
It certainly offers authors one more option when they want to publish
a machine-readable datetime without potentially alienating humans.
The list of options would be:
1. publish the full datetime in running text,
2. publish the full datetime in the title attribute of the abbr element,
3. publish date and time separately in the title attributes of
different abbr elements with class names of "value",
4. don't publish a machine-readable date at all (i.e. don't use
hCalendar).
Currently, the options jump straight from 2 to 4 (and 4 is a perfectly
viable option). Option 3 is a compromise that depends — like so many
markup patterns — on your interpretation of the semantics of the
proposal.
But your point is well taken: in an ideal world, no microformat would
mandate that authors publish any specific content. In practice, and
this is especially apparent given hCalendar's roots in the iCalendar
format, some kind of compromise is necessary. I'm aware that that
attitude could be a slippery slope to all sorts of machine-centric
formats which is why I think it's so important that whatever
compromise is chosen involves the least amount of change to existing
conventions and, most importantly of all, is based on existing
publishing behaviour.
Bye,
Jeremy
--
Jeremy Keith
a d a c t i o
http://adactio.com/
More information about the microformats-discuss
mailing list