[uf-discuss] Human and machine readable data format

Toby A Inkster mail at tobyinkster.co.uk
Fri Jul 11 01:38:13 PDT 2008


Paul Wilkins wrote:

> We should leverage the computers ability to do the hard work for us.
> <p>Date <span class="date">Friday, July the 11th 2008</span></p>

As I've said before, although my parser does support dates in this  
format, I strongly recommend *not* allowing these per spec, as it  
will lead to unpredictable and inconsistent results.

Yes, many programming languages do have libraries to do natural  
language parsing of dates, but these all differ subtly in what  
formats they support, how they interpret certain ambiguous dates, and  
how well they internationalise. e.g. I know that Perl's  
DateTime::Format::Natural, while it can perform very sophisticated  
parsing ("Saturday evening 3 months ago" => 2008-05-12T19:00:00,  
"thursday morning last week" => 2008-07-03T09:00:00) only includes  
English in the distributed module (though it has hooks allowing  
support for other languages). PHP's strtotime function is English  
only too, and there are differences in how it interprets some natural  
language dates, not just with Perl, but between different versions of  
PHP.

Natural language parsing is really not the way to go, nor is a  
limited range of date formats that *look* like NLP, because  
publishers will believe them to *be* NLP and start publishing in any  
old date format. ISO8601 is what we must stick with - we just must  
agree a better way of embedding it than <abbr>.

-- 
Toby A Inkster
<mailto:mail at tobyinkster.co.uk>
<http://tobyinkster.co.uk>




More information about the microformats-discuss mailing list