[uf-discuss] hoard.it

Thu Jul 3 15:03:35 PDT 2008

Hello,

This might be of interest to members of this group, as it deals with  
extracting data from semantic HTML. Prior to this year's Mashed  
Museum event at the University of Leicester, Dan Zambonini put  
together a prototype which aggregates data by spidering online museum  
catalogues:
http://hoardit.pbwiki.com/
It's a pretty fantastic demo of how information can be extracted from  
well-structured HTML, even before you think of putting microformats  
etc. on top.

In particular, it does a pretty good job of figuring out when an  
object was made:
http://feeds.boxuk.com/museums/object_100yrs.php
The date parser is based on some code Dan & I knocked together at  
Mashed Museum 2007, which  looks at strings like 'late Victorian',  
'early 20th Century', '4th January 1853' and so on, and converts them  
to machine-readable ISO dates.

Our original idea, which we never got round to actually implementing,  
was that this would be useful as a web service - you give it a  
string, it gives you a machine-parsable representation of that  
string. The recent discussion here about dates has made me wonder if  
such a web service woud be useful for microformats parsers. What do  
others think?

Cheers
Jim

Jim O'Donnell
jim at eatyourgreens.org.uk
http://eatyourgreens.org.uk
http://flickr.com/photos/eatyourgreens