[uf-discuss] Human and machine readable data format

Scott Reynen scott at randomchaos.com
Mon Jun 30 20:50:59 PDT 2008


On [Jun 30], at [ Jun 30] 4:29 , Jeremy Keith wrote:

>> There are a few cases where we are specifying content syntax for  
>> publishers, e.g. phone type in hCard.  And these are all similarly  
>> problematic.  I think we might get closer to solving these problems  
>> by considering them not in terms of whether or not humans could  
>> theoretically read them, but rather in terms of whether or not  
>> microformats should be requiring publishers to publish specific  
>> content.
>
> I agree with you in theory but in practice, the logical conclusion  
> is to attempt natural language parsing which seems like a boiling- 
> the-ocean approach to me (given the sheer number of languages).

I don't agree that's the logical conclusion.  If we agree that we  
shouldn't be specifying content syntax for publishers, then the  
logical conclusion in my view is that anything we need to specify  
shouldn't be published as content.  If it's not content, we don't need  
to use natural language (because only content needs to be human- 
readable), so there's no need for natural language parsing.

If HTML offered us a @metadata attribute, I think we'd do something  
like this:

<abbr title="June 30th, 2008" metadata="2008-06-30">6/30/08</abbr>

I think we actually have 3 distinct types of information here: 1)  
abbreviated content, 2) expanded content, and 3) ISO translation  
metadata.  And I think merging 2 and 3 was a mistake, made because we  
all agree content should be readable, #3 looks readable enough, and in  
iCalendar #3 is formatted the same as everything else we're treating  
as content.  When iCalendar is displayed to people, though, pretty  
much everything else we're considering content is simply displayed as- 
is, whereas the dates are used to generate a localized date, and only  
that localization is displayed as content.  Regardless of how readable  
we each may find ISO dates, there's an established practice of  
treating them as metadata instructions for producing content, not  
content itself.  I think we should follow this practice.

There is a danger in this practice in HTML that does not exist in  
iCalendar applications: the possibility of discrepancy between the  
content and the metadata.  But while we've in the past attempted to  
minimize this danger with semi-visible ISO dates, I think visible  
metadata is in some ways *increasing* the danger by encouraging  
publishers to treat the metadata like content, editing to make it more  
like the content they were already publishing, e.g. by leaving off  
time zones.

> So we compromise. The issue of providing an alternative to the abbr/ 
> datetime combo is bound to involve a compromise somehow.

I think approaching ISO dates as metadata rather than content will  
remove the need to compromise on core principles.

> But your point is well taken: in an ideal world, no microformat  
> would mandate that authors publish any specific content. In  
> practice, and this is especially apparent given hCalendar's roots in  
> the iCalendar format, some kind of compromise is necessary.

As I suggested above, I believe iCalendar actually treats dates  
differently than we've been treating them in hCalendar, as metadata  
rather than content.  Bringing our practice closer in line with  
iCalendar would, I think, allow us to solve this problem more easily.

Peace,
Scott



More information about the microformats-discuss mailing list