[uf-discuss] Human and machine readable data format
zen at zenpsycho.com
Mon Jul 14 05:39:05 PDT 2008
> Not sure if this thread is only covering datetimes in abbreviations. The
> title seems to suggest that it's more general so thought I'd chip in with a
> thought on geo as an example. How would a parser deal with natural
> (non_English) language here? Would it be expected to be able to parse
> Manchester or Salford or London or Londres or Londinium?
> Whilst it's just about possible to imagine NLP of dates and trickier to
> imagine NLP of multi-language date formats it's just beyond the realms of
> feasibility to consider NLP of place names
I'm confused, I'm afraid I don't understand the point of this thought excercise.
> I thought the problem was any non human readable data where humans can 'see'
> it - not confined to datetimes
One step at a time.
>> I find it terribly frustrating how many people cannot see that this
>> set of constraints yeilds NO solution. At least, when the constraints
>> are held to the level of strictness that this community is holding
>> them to.
> Seems to me there are 2 solutions:
> 1. relax the data hiding constraint (tricky because it's fundamental to the
> uf design philosophy and it's relaxation has been rejected many times)
> 2. maintain the status quo. Keep the abbreviation design pattern for machine
> friendly data and leave it up to publishers to decide if this is an issue
> for them - or not. It would probably need the microformats community to
> promote the design philosophy and potential issues a little higher than at
> present. But the wiki already documents much of this - just a bit more
> prominent linking and some padding out of /about to be a little more
There is another solution that I have been trying to advocate, which
is not metadata, and it's not natural language parsing. It is quite
simply, to define a strict date format that IS human readable, which
can optionally be used in place of ISO 8601 in the title attribute of
an ABBR tag. You can keep the percieved benefits of ISO 8601 for
international users, because the current pattern will continue to
work. However, for users in languages with a well defined date format,
a screen reader will not trip up on the date.
Whenever I mention this though, everyone seems to think I'm advocating
natural language processing. Let me just say again that this is not
I'm highly suspicious of the counterargument that such a solution
would need to support every language that ISO 8601 supports. This
argument does not make sense to me for two reasons: The first, iso
8601 doesn't support ANY language, it is only one date format among
many, based on an anglicised calendar, with the only multilingual
benefit owing to the fact that happens to be an international
standard. To someone with a different calendar, ISO8601 may make just
as much sense as "July 1st, 2007." that is: very little.
I like ISO 8601, but placing it in the title attribute of the ABBR has
clearly been a failure, if not a practical failure, it has been a
failure to the public image of microformats, and it has ultimately
shown the failure of the microformats community structure to be able
to deal with an issue such as this effectively.
The other reason I'm suspicious of this reason is that such a format
would practically only need to support as many languages as there are
screen readers. Unless a screen reader supports iso8601 in a title
attribute specifically, it's going to read out gibberish, and if it
encounters a date written in the wrong language it will read out
gibberish. No difference. However, in what I believe is the 80% case,
it reads out a date written in the correct language, then we've just
improved the experience for more people than we were able to
satisfactorally publish to before. What's the counterargument to that?
Another solution is to lobby the screen reader vendors to add explicit
support for ISO 8601 dates. It's a popular pattern for markup, and
adding support for reading them more humanely would provide a clear
benefit for their customers. I personally feel that this solution
would see more success than trying to wrangle the whole of the
microformats community into agreement on this issue.
More information about the microformats-discuss