[uf-discuss] Human and machine readable data format

Michael Smethurst Michael.Smethurst at bbc.co.uk
Mon Jul 14 04:19:27 PDT 2008


Hello


On 12/7/08 14:50, "Breton Slivka" <zen at zenpsycho.com> wrote:

> On Fri, Jul 11, 2008 at 6:47 PM, Dan Brickley <danbri at danbri.org> wrote:
>> Toby A Inkster wrote:
>>> 
>>> Paul Wilkins wrote:
>>> 
>>>> We should leverage the computers ability to do the hard work for us.
>>>> <p>Date <span class="date">Friday, July the 11th 2008</span></p>
>>> 
>>> As I've said before, although my parser does support dates in this format,
>>> I strongly recommend *not* allowing these per spec, as it will lead to
>>> unpredictable and inconsistent results.
>>> 
>>> Natural language parsing is really not the way to go, nor is a limited
>>> range of date formats that *look* like NLP, because publishers will believe
>>> them to *be* NLP and start publishing in any old date format. ISO8601 is
>>> what we must stick with - we just must agree a better way of embedding it
>>> than <abbr>.
>> 
>> Thank you for spelling this out so clearly. Please let's not slip into
>> treating the non-English-speaking Web as a corner case. ISO8601's the thing.
>> And it won't always be what the party reading the page expects (either in
>> terms of language, script or even calendar).
>> 


Not sure if this thread is only covering datetimes in abbreviations. The
title seems to suggest that it's more general so thought I'd chip in with a
thought on geo as an example. How would a parser deal with natural
(non_English) language here? Would it be expected to be able to parse
Manchester or Salford or London or Londres or Londinium?

Whilst it's just about possible to imagine NLP of dates and trickier to
imagine NLP of multi-language date formats it's just beyond the realms of
feasibility to consider NLP of place names


> 
> 
> In what way is ISO 8601 more friendly to non english speakers than any
> other date format?
> Please realise that by insisting that no natural language style will
> be a solution, you are essentially saying that there is no solution to
> this problem.
> 
> 1. metadata and information hiding is out of the question
> 2. putting ISO 8601 style dates ("machine dates") in any place where a
> human can see it or have it read to them  is "the problem" that we are
> trying to solve, so we can't do that.

I thought the problem was any non human readable data where humans can 'see'
it - not confined to datetimes

> 3. The date cannot resemble anything a human might want to read.
> 
> I find it terribly frustrating how many people cannot see that this
> set of constraints yeilds NO solution. At least, when the constraints
> are held to the level of strictness that this community is holding
> them to.

Seems to me there are 2 solutions:

1. relax the data hiding constraint (tricky because it's fundamental to the
uf design philosophy and it's relaxation has been rejected many times)

2. maintain the status quo. Keep the abbreviation design pattern for machine
friendly data and leave it up to publishers to decide if this is an issue
for them - or not. It would probably need the microformats community to
promote the design philosophy and potential issues a little higher than at
present. But the wiki already documents much of this - just a bit more
prominent linking and  some padding out of /about to be a little more
neutral.
> 
> 
>>> Natural language parsing is really not the way to go, nor is a limited
>>> range of date formats that *look* like NLP, because publishers will believe
>>> them to *be* NLP and start publishing in any old date format. ISO8601 is
>>> what we must stick with - we just must agree a better way of embedding it
>>> than <abbr>.
>> 
> 
> You may not like it, but too bad. Making a date resemble natural
> languge is the only way to go. I don't say this because it's my
> opinion. This is merely a fact, due to the nature of the problem, and
> the constraints that the community has enforced on possible solutions.
> Accept it, or doom yourselves to reasoning around in circles some
> more, as you have already done.


http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.
					


More information about the microformats-discuss mailing list