[uf-discuss] Human and machine readable data format

Breton Slivka zen at zenpsycho.com
Sat Jul 12 06:50:31 PDT 2008


On Fri, Jul 11, 2008 at 6:47 PM, Dan Brickley <danbri at danbri.org> wrote:
> Toby A Inkster wrote:
>>
>> Paul Wilkins wrote:
>>
>>> We should leverage the computers ability to do the hard work for us.
>>> <p>Date <span class="date">Friday, July the 11th 2008</span></p>
>>
>> As I've said before, although my parser does support dates in this format,
>> I strongly recommend *not* allowing these per spec, as it will lead to
>> unpredictable and inconsistent results.
>>
>> Yes, many programming languages do have libraries to do natural language
>> parsing of dates, but these all differ subtly in what formats they support,
>> how they interpret certain ambiguous dates, and how well they
>> internationalise. e.g. I know that Perl's DateTime::Format::Natural, while
>> it can perform very sophisticated parsing ("Saturday evening 3 months ago"
>> => 2008-05-12T19:00:00, "thursday morning last week" => 2008-07-03T09:00:00)
>> only includes English in the distributed module (though it has hooks
>> allowing support for other languages). PHP's strtotime function is English
>> only too, and there are differences in how it interprets some natural
>> language dates, not just with Perl, but between different versions of PHP.
>>
>> Natural language parsing is really not the way to go, nor is a limited
>> range of date formats that *look* like NLP, because publishers will believe
>> them to *be* NLP and start publishing in any old date format. ISO8601 is
>> what we must stick with - we just must agree a better way of embedding it
>> than <abbr>.
>
> Thank you for spelling this out so clearly. Please let's not slip into
> treating the non-English-speaking Web as a corner case. ISO8601's the thing.
> And it won't always be what the party reading the page expects (either in
> terms of language, script or even calendar).
>
> cheers,
>
> Dan
>


In what way is ISO 8601 more friendly to non english speakers than any
other date format?
Please realise that by insisting that no natural language style will
be a solution, you are essentially saying that there is no solution to
this problem.

1. metadata and information hiding is out of the question
2. putting ISO 8601 style dates ("machine dates") in any place where a
human can see it or have it read to them  is "the problem" that we are
trying to solve, so we can't do that.
3. The date cannot resemble anything a human might want to read.

I find it terribly frustrating how many people cannot see that this
set of constraints yeilds NO solution. At least, when the constraints
are held to the level of strictness that this community is holding
them to.


>> Natural language parsing is really not the way to go, nor is a limited
>> range of date formats that *look* like NLP, because publishers will believe
>> them to *be* NLP and start publishing in any old date format. ISO8601 is
>> what we must stick with - we just must agree a better way of embedding it
>> than <abbr>.
>
The premise that publishers will pick any old format is merely an
assertion with no evidence. Please show us an example somewhere else
where this has happened, or perhaps a better argument than merely
insisting on the "obvious" truth of it.

The way I see it, if they publish in the wrong format, then the
parsers won't pick up the date. This is what happens with microformats
already. I don't know about anyone else, but when I publish a
microformat, I test whether parsers can read it correctly. I do the
same thing with any html. If a publisher can't take the time to test,
and publish in the correct format then they take the consequences.
it's exactly the same with any other technology. Why should
microformats be any different? Why do you think making a microformat
resemble natural language drastically changes this set of rules?

As to the person who was concerned about forcing a particular format
in a place where a human can read it, I have not seen a single
proposed solution which does not do this, without violating the "no
information hiding" principle

You may not like it, but too bad. Making a date resemble natural
languge is the only way to go. I don't say this because it's my
opinion. This is merely a fact, due to the nature of the problem, and
the constraints that the community has enforced on possible solutions.
Accept it, or doom yourselves to reasoning around in circles some
more, as you have already done.


More information about the microformats-discuss mailing list