[uf-discuss] generic microformat parsing heuristics?
Phil Dawes
phil at phildawes.net
Wed Nov 9 06:07:47 PST 2005
Hi David,
David Janes -- BlogMatrix wrote:
> Phil (or Danny),
>
> If you have the time, what would a triple store for, say Neil Dunn's [1]
> and Ryan's [2] hCards (together, perhaps) look like?
>
> Regards, etc...
> David
>
> [1] http://www.ndunn.com/2005/10/7/hCard
> [2] http://theryanking.com/blog/contact/
>
I've just imported them into my JAM*VAT store. The microformat parser is
a bit crappy and misses out the address information in Neil's page
(amongst other things). I'm in the process of re-writing it following
the hcard-parsing stuff[1] that tantek pointed me to.
Anyway - it should give you an idea. The data is crunched into
statements. e.g.
"Neil Dunn" tag vcard
"Neil Dunn" url http://www.ndunn.com
"Neil Dunn" fn "Neil Dunn"
and then indexed. (In an RDF parser the symbols would be converted into
URIs somehow).
You can see the interpretted statments by clicking the links on the
graphs page:
http://phildawes.net/jamvat/graphs
You can then search, browse and query the aggregated data.
E.g. try a search for "neil vcard"
http://phildawes.net/jamvat/search?str=neil+vcard
(or a search for 'vcard' to get all the aggregated vcards)
To pull out all the vcard info, a structured query gives more power:
E.g. try pasting the following into the query window:
http://phildawes.net/jamvat/queryui
----
select ?fn, ?url
where (?card fn ?fn)
[(?card url ?url)]
----
Of course the data is a bit limited here - none of the vcards refer to
other people and so it doesn't really 'connect'. FOAF[2] data is a lot
more fun in this regard because it links people - XFN would need some
special parsing to relate hcard information, but you could see how it
might work.
Vevent data is also fun because you can do range queries on it.
e.g. the vevents imported from tanteks page allow range searches:
searching for 'event >2006-01-01':
http://phildawes.net/jamvat/search?str=event+%3E2006-01-01
or try pasting the following structured query on the query page (gets
events in the month of october 2005):
http://phildawes.net/jamvat/queryui
-------
select ?summary, ?location, ?start, ?end
where (?event summary ?summary)
(?event dtstart ?start)
(?event dtend ?end)
(?event location ?location)
(?start > 2005-10-1)
(?start < 2005-11-1)
-------
Hope this all makes sense,
Cheers,
Phil
[1] http://microformats.org/wiki/hcard-parsing
[2] http://www.foaf-project.org/
More information about the microformats-discuss
mailing list