[uf-discuss] hoard.it
Jim O'Donnell
jim at eatyourgreens.org.uk
Thu Jul 3 15:03:35 PDT 2008
Hello,
This might be of interest to members of this group, as it deals with
extracting data from semantic HTML. Prior to this year's Mashed
Museum event at the University of Leicester, Dan Zambonini put
together a prototype which aggregates data by spidering online museum
catalogues:
http://hoardit.pbwiki.com/
It's a pretty fantastic demo of how information can be extracted from
well-structured HTML, even before you think of putting microformats
etc. on top.
In particular, it does a pretty good job of figuring out when an
object was made:
http://feeds.boxuk.com/museums/object_100yrs.php
The date parser is based on some code Dan & I knocked together at
Mashed Museum 2007, which looks at strings like 'late Victorian',
'early 20th Century', '4th January 1853' and so on, and converts them
to machine-readable ISO dates.
Our original idea, which we never got round to actually implementing,
was that this would be useful as a web service - you give it a
string, it gives you a machine-parsable representation of that
string. The recent discussion here about dates has made me wonder if
such a web service woud be useful for microformats parsers. What do
others think?
Cheers
Jim
Jim O'Donnell
jim at eatyourgreens.org.uk
http://eatyourgreens.org.uk
http://flickr.com/photos/eatyourgreens
More information about the microformats-discuss
mailing list