[uf-discuss] Human and machine readable data format

Dan Brickley danbri at danbri.org
Fri Jul 11 01:47:35 PDT 2008


Toby A Inkster wrote:
> Paul Wilkins wrote:
> 
>> We should leverage the computers ability to do the hard work for us.
>> <p>Date <span class="date">Friday, July the 11th 2008</span></p>
> 
> As I've said before, although my parser does support dates in this 
> format, I strongly recommend *not* allowing these per spec, as it will 
> lead to unpredictable and inconsistent results.
> 
> Yes, many programming languages do have libraries to do natural language 
> parsing of dates, but these all differ subtly in what formats they 
> support, how they interpret certain ambiguous dates, and how well they 
> internationalise. e.g. I know that Perl's DateTime::Format::Natural, 
> while it can perform very sophisticated parsing ("Saturday evening 3 
> months ago" => 2008-05-12T19:00:00, "thursday morning last week" => 
> 2008-07-03T09:00:00) only includes English in the distributed module 
> (though it has hooks allowing support for other languages). PHP's 
> strtotime function is English only too, and there are differences in how 
> it interprets some natural language dates, not just with Perl, but 
> between different versions of PHP.
> 
> Natural language parsing is really not the way to go, nor is a limited 
> range of date formats that *look* like NLP, because publishers will 
> believe them to *be* NLP and start publishing in any old date format. 
> ISO8601 is what we must stick with - we just must agree a better way of 
> embedding it than <abbr>.

Thank you for spelling this out so clearly. Please let's not slip into 
treating the non-English-speaking Web as a corner case. ISO8601's the 
thing. And it won't always be what the party reading the page expects 
(either in terms of language, script or even calendar).

cheers,

Dan

-
http://danbri.org/


More information about the microformats-discuss mailing list