[uf-discuss] generic microformat parsing heuristics?

Phil Dawes phil at phildawes.net
Wed Nov 9 06:07:47 PST 2005

Hi David,

David Janes -- BlogMatrix wrote:
> Phil (or Danny),
> If you have the time, what would a triple store for, say Neil Dunn's [1] 
> and Ryan's [2] hCards (together, perhaps) look like?
> Regards, etc...
> David
> [1] http://www.ndunn.com/2005/10/7/hCard
> [2] http://theryanking.com/blog/contact/

I've just imported them into my JAM*VAT store. The microformat parser is 
a bit crappy and misses out the address information in Neil's page 
(amongst other things). I'm in the process of re-writing it following 
the hcard-parsing stuff[1] that tantek pointed me to.

Anyway - it should give you an idea. The data is crunched into 
statements. e.g.

"Neil Dunn" tag vcard
"Neil Dunn" url http://www.ndunn.com
"Neil Dunn" fn "Neil Dunn"

and then indexed. (In an RDF parser the symbols would be converted into 
URIs somehow).
You can see the interpretted statments by clicking the links on the 
graphs page:

You can then search, browse and query the aggregated data.
E.g. try a search for "neil vcard"

(or a search for 'vcard' to get all the aggregated vcards)

To pull out all the vcard info, a structured query gives more power:
E.g. try pasting the following into the query window:
select ?fn, ?url
where (?card fn ?fn)
       [(?card url ?url)]

Of course the data is a bit limited here - none of the vcards refer to 
other people and so it doesn't really 'connect'. FOAF[2] data is a lot 
more fun in this regard because it links people - XFN would need some 
special parsing to relate hcard information, but you could see how it 
might work.

Vevent data is also fun because you can do range queries on it.
e.g. the vevents imported from tanteks page allow range searches:
searching for 'event >2006-01-01':

or try pasting the following structured query on the query page (gets 
events in the month of october 2005):
select ?summary, ?location, ?start, ?end
where (?event summary ?summary)
       (?event dtstart ?start)
       (?event dtend ?end)
       (?event location ?location)
       (?start > 2005-10-1)
       (?start < 2005-11-1)

Hope this all makes sense,



[1] http://microformats.org/wiki/hcard-parsing
[2] http://www.foaf-project.org/

More information about the microformats-discuss mailing list