[uf-dev] Human and machine readable data format

Glenn Jones glenn.jones at madgex.com
Sun Jun 29 07:17:17 PDT 2008


As we turnaround on the spot about machine data issue, the question of
Natural Language Processing (NPL) has come up again. The main problem
with any form of NLP is there are too many ambiguities in reading dates
or any other form of freeform human written text.  I don't want us to go
down this path it is unworkable with currently available technologies. 

Against this we have statements like Tantek's. "I'm vehemently opposed
to putting data in the class attribute. We must find better
alternatives. We must not go down the path of invisible (dark)
(meta)data - IMHO that principle is inviolable for microformats."

So I have tried to look at this again and reconcile the two opposing
drivers above. Each time it makes me think of a mixed mode human and
machine readable format. The date format which is human readable but has
a very strict format which can be parsed.  So rather than talk about it
I have built a little prototype which demos the idea.  

http://ufxtract.com/experimental/hm-readable-date.htm
    
This approach is not without its own problems, but it would provide a
semantic use of the abbr pattern which does not raise any accessibility
concerns. 

<abbr class="dtstart" title="Date: 25 January 2008 at 15:30, Time zone
+1:00">Jan 25 08</abbr>

On the down side we would have to re-invent the wheel with yet another
date format. This approach would make parsers a lot heavier. Authors
would have to understand the strict nature of the extended format using
the abbr title. etc

I thought I would put this forward - to get shot down ;-)



This concept could be extended to the other data formats:

Date: 25 January 2008
Date: 25 January 2008 at 15:30 
Date: 25 January 2008 at 15:30, Time zone  +1:30
Duration: 3 minutes, 47 seconds
Location:  latitude 37.77, longitude -122.41
Time zone: +1:30
Rated 1 out of 5 


Glenn Jones 









More information about the microformats-dev mailing list