[uf-discuss] Human and machine readable data format

Jeremy Keith jeremy at adactio.com
Mon Jun 30 15:29:10 PDT 2008


Scott wrote:
> I think the problem may be clarified by actually writing those out  
> in a sentence:
>
> I arrived at work 5 minutes ago.
> I arrived at work 14:00.
>
> The latter doesn't seem human-readable to me.

But it does to me. And that's kind of the crux of the issue. Defining  
"human readable" is a lot harder than defining "machine  
readable" (which is measurable). You're a human and I'm a human but we  
disagree about what's readable.

It's a semantic debate. I don't mean that in a bad way. HTML is all  
about semantics and nobody likes a semantic debate more than me (ah,  
how I miss Dan's SimpleQuiz).

> There are a few cases where we are specifying content syntax for  
> publishers, e.g. phone type in hCard.  And these are all similarly  
> problematic.  I think we might get closer to solving these problems  
> by considering them not in terms of whether or not humans could  
> theoretically read them, but rather in terms of whether or not  
> microformats should be requiring publishers to publish specific  
> content.

I agree with you in theory but in practice, the logical conclusion is  
to attempt natural language parsing which seems like a boiling-the- 
ocean approach to me (given the sheer number of languages).

So we compromise. The issue of providing an alternative to the abbr/ 
datetime combo is bound to involve a compromise somehow. I would  
rather rather that the compromise follow existing publishing behaviour.

No solution is going to be perfect. But separating dates and times  
while reusing the value excerpting pattern seems like the least  
problematic to me. It involves the least change to existing  
conventions and I *think* it will involve minimal changes for parsers  
(though that needs to be tested).

It certainly offers authors one more option when they want to publish  
a machine-readable datetime without potentially alienating humans.

The list of options would be:

1. publish the full datetime in running text,
2. publish the full datetime in the title attribute of the abbr element,
3. publish date and time separately in the title attributes of  
different abbr elements with class names of "value",
4. don't publish a machine-readable date at all (i.e. don't use  
hCalendar).

Currently, the options jump straight from 2 to 4 (and 4 is a perfectly  
viable option). Option 3 is a compromise that depends — like so many  
markup patterns — on your interpretation of the semantics of the  
proposal.

But your point is well taken: in an ideal world, no microformat would  
mandate that authors publish any specific content. In practice, and  
this is especially apparent given hCalendar's roots in the iCalendar  
format, some kind of compromise is necessary. I'm aware that that  
attitude could be a slippery slope to all sorts of machine-centric  
formats which is why I think it's so important that whatever  
compromise is chosen involves the least amount of change to existing  
conventions and, most importantly of all, is based on existing  
publishing behaviour.

Bye,

Jeremy

-- 
Jeremy Keith

a d a c t i o

http://adactio.com/





More information about the microformats-discuss mailing list