[uf-discuss] Human and machine readable data format
danbri at danbri.org
Fri Jul 11 01:47:35 PDT 2008
Toby A Inkster wrote:
> Paul Wilkins wrote:
>> We should leverage the computers ability to do the hard work for us.
>> <p>Date <span class="date">Friday, July the 11th 2008</span></p>
> As I've said before, although my parser does support dates in this
> format, I strongly recommend *not* allowing these per spec, as it will
> lead to unpredictable and inconsistent results.
> Yes, many programming languages do have libraries to do natural language
> parsing of dates, but these all differ subtly in what formats they
> support, how they interpret certain ambiguous dates, and how well they
> internationalise. e.g. I know that Perl's DateTime::Format::Natural,
> while it can perform very sophisticated parsing ("Saturday evening 3
> months ago" => 2008-05-12T19:00:00, "thursday morning last week" =>
> 2008-07-03T09:00:00) only includes English in the distributed module
> (though it has hooks allowing support for other languages). PHP's
> strtotime function is English only too, and there are differences in how
> it interprets some natural language dates, not just with Perl, but
> between different versions of PHP.
> Natural language parsing is really not the way to go, nor is a limited
> range of date formats that *look* like NLP, because publishers will
> believe them to *be* NLP and start publishing in any old date format.
> ISO8601 is what we must stick with - we just must agree a better way of
> embedding it than <abbr>.
Thank you for spelling this out so clearly. Please let's not slip into
treating the non-English-speaking Web as a corner case. ISO8601's the
thing. And it won't always be what the party reading the page expects
(either in terms of language, script or even calendar).
More information about the microformats-discuss