[uf-discuss] Human and machine readable data format
Scott Reynen
scott at randomchaos.com
Mon Jun 30 20:50:59 PDT 2008
On [Jun 30], at [ Jun 30] 4:29 , Jeremy Keith wrote:
>> There are a few cases where we are specifying content syntax for
>> publishers, e.g. phone type in hCard. And these are all similarly
>> problematic. I think we might get closer to solving these problems
>> by considering them not in terms of whether or not humans could
>> theoretically read them, but rather in terms of whether or not
>> microformats should be requiring publishers to publish specific
>> content.
>
> I agree with you in theory but in practice, the logical conclusion
> is to attempt natural language parsing which seems like a boiling-
> the-ocean approach to me (given the sheer number of languages).
I don't agree that's the logical conclusion. If we agree that we
shouldn't be specifying content syntax for publishers, then the
logical conclusion in my view is that anything we need to specify
shouldn't be published as content. If it's not content, we don't need
to use natural language (because only content needs to be human-
readable), so there's no need for natural language parsing.
If HTML offered us a @metadata attribute, I think we'd do something
like this:
<abbr title="June 30th, 2008" metadata="2008-06-30">6/30/08</abbr>
I think we actually have 3 distinct types of information here: 1)
abbreviated content, 2) expanded content, and 3) ISO translation
metadata. And I think merging 2 and 3 was a mistake, made because we
all agree content should be readable, #3 looks readable enough, and in
iCalendar #3 is formatted the same as everything else we're treating
as content. When iCalendar is displayed to people, though, pretty
much everything else we're considering content is simply displayed as-
is, whereas the dates are used to generate a localized date, and only
that localization is displayed as content. Regardless of how readable
we each may find ISO dates, there's an established practice of
treating them as metadata instructions for producing content, not
content itself. I think we should follow this practice.
There is a danger in this practice in HTML that does not exist in
iCalendar applications: the possibility of discrepancy between the
content and the metadata. But while we've in the past attempted to
minimize this danger with semi-visible ISO dates, I think visible
metadata is in some ways *increasing* the danger by encouraging
publishers to treat the metadata like content, editing to make it more
like the content they were already publishing, e.g. by leaving off
time zones.
> So we compromise. The issue of providing an alternative to the abbr/
> datetime combo is bound to involve a compromise somehow.
I think approaching ISO dates as metadata rather than content will
remove the need to compromise on core principles.
> But your point is well taken: in an ideal world, no microformat
> would mandate that authors publish any specific content. In
> practice, and this is especially apparent given hCalendar's roots in
> the iCalendar format, some kind of compromise is necessary.
As I suggested above, I believe iCalendar actually treats dates
differently than we've been treating them in hCalendar, as metadata
rather than content. Bringing our practice closer in line with
iCalendar would, I think, allow us to solve this problem more easily.
Peace,
Scott
More information about the microformats-discuss
mailing list