[uf-discuss] Human and machine readable data format

Mon Jun 30 08:16:48 PDT 2008

Martin McEvoy wrote:
> My thought for some time now is that the problem should be  
> simplified a
> little, maybe also the problem could be looked at a little differently
> by trying to mark up datetime as all one thing which is great for a
> machine, when really you should be trying to mark it up in a way  
> humans
> understand, date and time.

I agree completely. This is something that Tantek and I were  
discussing recently in person. It would also match more closely to how  
people are publishing event data: usually the date and time are  
separated (sometimes in separate sentences).

Given that 2008-06-30 is human-readable and
given that 18:00 is human-readable and
given that 2008-06-30T18:00:00 is not quite as human-readable (or at  
least not as human-friendly),
it makes sense to offer authors an alternative to the abbr/datetime  
combination.

> <span class="dtstart">
> On <abbr class="date" title="2008-06-30">June 30th</abbr>
> at <abbr class="time" title="09:00+0100">9.00am</abbr>
> </span>

I was originally thinking along these lines: some kind of optimization  
pattern that would involve adding class names (like "date" and  
"time"). There is a precedence for this in hCard. The values "given- 
name" and "family-name" don't derive directly from the vcard spec —  
they were derived from the associated documentation in order to give  
authors a way of separating the components of the n value.

But...

Following a brainstorming session in IRC (that I'm kicking myself I  
missed), there may be an even simpler solution that doesn't call for  
the creation of new class names but instead reuses the existing  
"value" property.

The notes from that brainstorming have been documented on the wiki:

http://microformats.org/wiki/abbr-datetime-pattern#date_and_time_separation_using_value_excerption

In a nutshell, the proposal would allow authors to write:

<span class="dtstart">
On <abbr class="value" title="2008-06-30">June 30th</abbr>
at <abbr class="value" title="09:00">9.00am</abbr>
</span>

With the provisos that authors:
*must* use hyphens to separate the date value (2008-06-30, not  
20080630),
*must* use colons to separate the time value (09:00, not 0900),
*must not* use colons to separate the timezone value (-0500, not  
-05:00).

Parsers would then be able to apply the following rules when parsing  
dtstart, dtend and other datetime fields:
data contained in a "value" class that has hyphens is the date  
component,
data contained in a "value" class that has colons is the time  
component (if no seconds are provided, default to 00),
data contained in a "value" class that has no hyphens or colons is the  
timezone component.

Now I'm not saying that this solution is perfect but it's by far the  
best I've seen so far. It doesn't involve hiding data and it doesn't  
involve stuffing data values in the class attribute. It *does* still  
use the abbr element for a usage that is arguably semantically dodgy.  
But any solution is going to involve some kind of compromise to a  
greater or lesser degree and this is a level of compromise that I  
personally find acceptable (it also maintains backwards compatibility  
with existing publishing behaviour).

In essence, it's an optimization rule; somewhat similar to fn  
optimization rule in hCard. It will mean more work for parsers (but  
not nearly as much work as natural language parsing) while freeing up  
authors to separate date and time values.

At this stage, I think it would be worth creating some test cases  
(based on real-world publishing examples) and let the authors of  
parsers (Mike, Glenn, Brian, Toby, etc.) kick the tyres and surface  
any issues.

Bye,

Jeremy

-- 
Jeremy Keith

a d a c t i o

http://adactio.com/