[uf-discuss] hoard.it

Jim O'Donnell jim at eatyourgreens.org.uk
Wed Jul 9 14:30:26 PDT 2008


On 8 Jul 2008, at 06:45, Guillaume Lebleu wrote:

> Jim O'Donnell wrote:
>> The recent discussion here about dates has made me wonder if such  
>> a web service woud be useful for microformats parsers. What do  
>> others think?
> It seems to me that this type of date extraction might present  
> risks if used by uf parsers to extract date/time from published  
> content (and lead to the "people showing up on the wrong date"  
> error mentioned in earlier posts).
>
I don't think it's so risky. The inspiration for this particular work  
was Dan's experience on the 20th century London site: http://www. 
20thcenturylondon.org.uk/ which involved parsing and normalising text  
dates across four different collections. Granted it's tedious to  
analyse all the different patterns that have been used, but it isn't  
impossible to extract accurate ISO dates. The fact that archive was  
created from those four collections is a testament to that.

Museum catalogue records always have some sort of absolute date,  
though, which makes things easier for me. If people are marking up  
phrases like 'this Saturday' or '25th June' then I can see that  
extracting a date would be tricky - the parser would need the context  
within which to place the date, in order to get the year or month.

That said, I don't how often people use hcalendar to mark up phrases  
like 'next weekend' vs, say, 'Saturday 19th July 2008'. If we had  
some idea of how microformats are being used to mark up dates in  
real, online text, then we could make some meaningful statements  
about how risky, or even impossible, it might be to extract ISO dates  
automatically.


> On the other hand, it might be great at the time content is  
> authored, to convert ambiguous natural language dates into  
> unambiguous microformats, as a way to reduce the pain of micro- 
> formatting content (especially it can detect dates in plain text  
> rather than parsing something it knows is a date). Authors could  
> confirm the generated microformats before publishing in a way  
> similar to how Yahoo! shortcuts Wordpress plugin works [1]
>
Decent authoring tools would be brilliant. Not just for dates but  
locations and possibly other types of microformatted text. For  
instance, I can link a UK street address to Google maps and get back  
a precise point on a map of the UK. So do I really need to manually  
write a lat/long into the HTML to tell a microformats tool how to  
place the address on a map? The text contains all the necessary  
information to perform this operation already.

I think microformats should be relatively easy for a non-technical  
author to add to their text. Decent tools that generate the machine- 
readable data would be an enormous aid here.

Jim

Jim O'Donnell
jim at eatyourgreens.org.uk
http://eatyourgreens.org.uk
http://flickr.com/photos/eatyourgreens





More information about the microformats-discuss mailing list