[uf-discuss] Re: Apple Data Detectors

Alex Faaborg faaborg at mozilla.com
Fri Feb 8 17:47:27 PST 2008

> On the other end, if, as I type this, I get an intellisense-like  
> list of my contacts that I can select from, then I can just select  
> Joe from the list and have the microformat markup added for me

I've been thinking a lot about how a Web browser could help end users  
author microformatted content in blogs and wikis, and I think we need  
to consider the user's goals and motivations.  I can't imagine people  
associating a contact in their address book with Joe as they casually  
mention him in a blog post just because they have an appreciation for  
the beauty of structured data.  However, if their goal is closely  
aligned with the goal of their readers, then I can see users going to  
the extra effort.  For instance, let's say you want to review  
something, and  because you want your vote to count and other people  
to be able to take advantage of your review once it gets aggregated, I  
can see users going to the extra effort of filling out a form like the  
hReview creator (http://microformats.org/code/hreview/creator) to get  
information into the structure of an hReview.  The same goes for  
people who want to promote an event: since their motivation is for  
people to attend, they make it easy for users to add the event to  
their calendar.  We already see this type of behavior in applications  
like Outlook or Zimbra, where people create events for other people,  
so they are easy to accept.  Microformats allow to take that  
interaction out of closed systems, and apply it to HTML emails, blog  
posts, wikis, etc.

I'm all for building systems that attempt to infer structure from  
natural language, because like we see in Apple's 1998 article, and now  
in Mail.app, these types of systems can be really useful when they  
work.  But I also don't think we should discount situations where the  
user may actually have a clear motivation for creating structured data  
by filling out a form.

In case anyone is interested in reading more about Data Detectors, you  
might find this paper interesting.  It catalogs all of the research  
done throughout the late 90s, and discusses a prototype system that  
leverages large knowledge bases like Stanford's TAP and MIT's  
ConceptNet to disambiguate natural language and provide structure to  
unstructured text:



On Feb 8, 2008, at 8:40 AM, Guillaume Lebleu wrote:

> Toby A Inkster wrote:
>> Guillaume Lebleu wrote:
>>> What I have been thinking more and more and what this tells me  
>>> again is
>>> that the same way we talk of POSH and microformats, we could talk of
>>> plain text or plain old english formats, essentially standardizing  
>>> how
>>> people write dates, addresses, etc on the Web or on their emails.  
>>> Asking
>>> people to write "Tuesday, February 5, 2008" in this order, with the
>>> commas, etc. is very likely even simpler for normal people than  
>>> writing
>>> <abbr class="foo" title="2008-05-02">Tuesday, February 5, 2008</ 
>>> abbr>.
>> One problem with that is that it will find matches on people who  
>> aren't even intending to use your plain-old-english format. They  
>> may happen to be including "Tuesday, February 5, 2008" on their  
>> pages with a different intended meaning. 2008 could refer to eight  
>> minutes past eight PM in military time -- unlikely, but possible.  
>> And as you move away from dates, phone numbers and postcodes which  
>> have relatively parseable formats, towards locations, people's  
>> names and job titles and so on, the likelihood of false matches  
>> increases.
>> The use of explicit tags to mark up information do make  
>> microformats slightly harder to use, yes. But the key is that they  
>> also make microformats much easier to explicitly not use.
> Toby,
> I understand the challenge of disambiguation and the value  
> microformats bring in terms of easier parser implementation and more  
> reliable information consumption experience. The challenge for  
> average people writing microformats can't be underestimated though.  
> I strongly believe that the time where disambiguation costs are the  
> lowest are at publishing time, but this is also the time where you  
> are focused on the english content, not the microformats. This is  
> why in the second part of the post you cited, I suggested the use of  
> Apple Data Detectors' like functionality, not to detect objects in  
> plain old english (POE) in published content, but to detect objects  
> in POE at the time they are written and ask for the user for  
> disambiguation at the same time, in a way that the underlying  
> microformat markup is generated, but without the user having to know  
> the syntax. I'm thinking of this particularly in the context of  
> writing a blog post: writing 1 hCards just to say "My friend Joe" is  
> way too much for normal people. On the other end, if, as I type  
> this, I get an intellisense-like list of my contacts that I can  
> select from, then I can just select Joe from the list and have the  
> microformat markup added for me (just like Wordpress adds a lot of  
> markup that isn't in the visual editor or like Wiki converts  
> simplified markup into HTML markup).
> Guillaume
> _______________________________________________
> microformats-discuss mailing list
> microformats-discuss at microformats.org
> http://microformats.org/mailman/listinfo/microformats-discuss

More information about the microformats-discuss mailing list