jim at eatyourgreens.org.uk
Wed Jul 9 14:30:26 PDT 2008
On 8 Jul 2008, at 06:45, Guillaume Lebleu wrote:
> Jim O'Donnell wrote:
>> The recent discussion here about dates has made me wonder if such
>> a web service woud be useful for microformats parsers. What do
>> others think?
> It seems to me that this type of date extraction might present
> risks if used by uf parsers to extract date/time from published
> content (and lead to the "people showing up on the wrong date"
> error mentioned in earlier posts).
I don't think it's so risky. The inspiration for this particular work
was Dan's experience on the 20th century London site: http://www.
20thcenturylondon.org.uk/ which involved parsing and normalising text
dates across four different collections. Granted it's tedious to
analyse all the different patterns that have been used, but it isn't
impossible to extract accurate ISO dates. The fact that archive was
created from those four collections is a testament to that.
Museum catalogue records always have some sort of absolute date,
though, which makes things easier for me. If people are marking up
phrases like 'this Saturday' or '25th June' then I can see that
extracting a date would be tricky - the parser would need the context
within which to place the date, in order to get the year or month.
That said, I don't how often people use hcalendar to mark up phrases
like 'next weekend' vs, say, 'Saturday 19th July 2008'. If we had
some idea of how microformats are being used to mark up dates in
real, online text, then we could make some meaningful statements
about how risky, or even impossible, it might be to extract ISO dates
> On the other hand, it might be great at the time content is
> authored, to convert ambiguous natural language dates into
> unambiguous microformats, as a way to reduce the pain of micro-
> formatting content (especially it can detect dates in plain text
> rather than parsing something it knows is a date). Authors could
> confirm the generated microformats before publishing in a way
> similar to how Yahoo! shortcuts Wordpress plugin works 
Decent authoring tools would be brilliant. Not just for dates but
locations and possibly other types of microformatted text. For
instance, I can link a UK street address to Google maps and get back
a precise point on a map of the UK. So do I really need to manually
write a lat/long into the HTML to tell a microformats tool how to
place the address on a map? The text contains all the necessary
information to perform this operation already.
I think microformats should be relatively easy for a non-technical
author to add to their text. Decent tools that generate the machine-
readable data would be an enormous aid here.
jim at eatyourgreens.org.uk
More information about the microformats-discuss