[uf-discuss] generic microformat parsing heuristics?
phil at phildawes.net
Wed Nov 9 06:07:47 PST 2005
David Janes -- BlogMatrix wrote:
> Phil (or Danny),
> If you have the time, what would a triple store for, say Neil Dunn's 
> and Ryan's  hCards (together, perhaps) look like?
> Regards, etc...
>  http://www.ndunn.com/2005/10/7/hCard
>  http://theryanking.com/blog/contact/
I've just imported them into my JAM*VAT store. The microformat parser is
a bit crappy and misses out the address information in Neil's page
(amongst other things). I'm in the process of re-writing it following
the hcard-parsing stuff that tantek pointed me to.
Anyway - it should give you an idea. The data is crunched into
"Neil Dunn" tag vcard
"Neil Dunn" url http://www.ndunn.com
"Neil Dunn" fn "Neil Dunn"
and then indexed. (In an RDF parser the symbols would be converted into
You can see the interpretted statments by clicking the links on the
You can then search, browse and query the aggregated data.
E.g. try a search for "neil vcard"
(or a search for 'vcard' to get all the aggregated vcards)
To pull out all the vcard info, a structured query gives more power:
E.g. try pasting the following into the query window:
select ?fn, ?url
where (?card fn ?fn)
[(?card url ?url)]
Of course the data is a bit limited here - none of the vcards refer to
other people and so it doesn't really 'connect'. FOAF data is a lot
more fun in this regard because it links people - XFN would need some
special parsing to relate hcard information, but you could see how it
Vevent data is also fun because you can do range queries on it.
e.g. the vevents imported from tanteks page allow range searches:
searching for 'event >2006-01-01':
or try pasting the following structured query on the query page (gets
events in the month of october 2005):
select ?summary, ?location, ?start, ?end
where (?event summary ?summary)
(?event dtstart ?start)
(?event dtend ?end)
(?event location ?location)
(?start > 2005-10-1)
(?start < 2005-11-1)
Hope this all makes sense,
More information about the microformats-discuss