[uf-discuss] generic microformat parsing heuristics?

Wed Nov 9 07:11:31 PST 2005

On 11/9/05, David Janes -- BlogMatrix <davidjanes at blogmatrix.com> wrote:
> Phil (or Danny),
>
> If you have the time, what would a triple store for, say Neil Dunn's [1]
> and Ryan's [2] hCards (together, perhaps) look like?

Exactly the same as they do now, if serialised to XHTML or vCard ;-)

All right, all right, I know what you mean...

Hmm, but there isn't yet a generally accepted expression of vCard in
RDF, though I did find one from 2001 (via SchemaWeb) but it looks very
much in need of revisiting.
http://www.w3.org/TR/vcard-rdf

So...I'd be tempted to do things more simply than described in that
doc, there's a lot of container stuff that probably isn't needed. FOAF
suggests it makes sense to refer to a person indirectly, so I'll
follow that pattern. Turtle (/N3) RDF syntax is the easiest for
looking at the statements, this would give the data for Ryan's card
something like:

@prefix vCard: <http://www.w3.org/2005/vcard-rdf#> .

_:ryan vCard:SOURCE <http://theryanking.com/blog/contact/#vcard> .
_:ryan vCard:FN "Ryan King" .
_:ryan vCard:EMAIL <mailto:ryan at theryanking.com> .

The prefix is the namespace prefix, _:ryan is a blank node (the 'ryan'
part is arbitrary) used to identify an entity without a URI, the bits
in the middle are the properties (all corresponding to URIs) and the
bits on the right are the values, here there are a couple of URIs and
a literal.

This kind of stuff could be generated easily enough by passing hCard
microformat data through XSLT.

Neil's would look something like this:

@prefix vCard: <http://www.w3.org/2005/vcard-rdf#> .

_:neil vCard:SOURCE <http://www.ndunn.com/2005/10/7/hCard> .
_:neil vCard:FN "Neil Dunn" .
_:neil vCard:EMAIL <mailto:ndunn at ndunn.com> .
_:neil vCard:URL <http://www.ndunn.com> .
_:neil vCard:photo <http://www.ndunn.com/vcard/face.png> .

(I may well be simplifying too much there - the vCard PHOTO looks like
it could have other attributes, probably need a blank node of its own)

What you'd get if you merged both sets of data into a triplestore
could be expressed simply as:

@prefix vCard: <http://www.w3.org/2005/vcard-rdf#> .

_:ryan vCard:SOURCE <http://theryanking.com/blog/contact/#vcard> .
_:ryan vCard:FN "Ryan King" .
_:ryan vCard:EMAIL <mailto:ryan at theryanking.com> .
_:neil vCard:SOURCE <http://www.ndunn.com/2005/10/7/hCard> .
_:neil vCard:FN "Neil Dunn" .
_:neil vCard:EMAIL <mailto:ndunn at ndunn.com> .
_:neil vCard:URL <http://www.ndunn.com> .
_:neil vCard:PHOTO <http://www.ndunn.com/vcard/face.png> .

Yep, just add 'em.

This could be queried using SPARQL, with queries something like:

@prefix vCard: <http://www.w3.org/2005/vcard-rdf#> .

SELECT ?name, ?email WHERE {
    _:person vCard:FN ?name .
    _:person vCard:EMAIL ?person .
}

The results of this (which you'd get either in RDF/XML or the simple
SPARQL XML results format) could be rendered as:

____________________________
|    name   |            email            |
------------------------------------------------
| Ryan King | mailto:ryan at theryanking.com |
| Neil Dunn | mailto:ndunn at ndunn.com      |
____________________________________

Now if you had a triplestore that understands RDF/OWL inference, you
could add statements like:

@prefix vCard: <http://www.w3.org/2005/vcard-rdf#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .

vCard:FN owl:equivalentProperty foaf:name .
vCard:EMAIL owl:equivalentProperty foaf:mbox .

Then after inference, you could swap the FOAF terms in the query above.
The basic merging of RDF data really is trivial, you might also want to include:

@prefix vCard: <http://www.w3.org/2005/vcard-rdf#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .

_:ryan vCard:URL <http://theryanking.com> .
_:neil vCard:URL <http://www.ndunn.com> .
_:ryan foaf:knows _:neil .

This particular bit could have been obtained by passing Ryan's XFN
data through XSLT. If the triplestore doesn't have inference available
you can just get the statements derived above at the start through the
XSLT, i.e. so you'd generate both:

_:neil vCard:EMAIL <mailto:ndunn at ndunn.com> .
_:neil foaf:mbox <mailto:ndunn at ndunn.com> .

Naturally this stuff can be serialized as RDF/XML, there's a chunk of
it below (I used this converter:
http://www.mindswap.org/2002/rdfconvert/ )

It's also worth remembering that as well as lists of statements, RDF
can also be seen as describing a graph structure. Such as the one here
(scroll down) :

http://www.w3.org/RDF/Validator/ARPServlet?URI=http://dannyayers.com/2005/11/vcard.rdf

Cheers,
Danny.

<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:vCard="http://www.w3.org/2005/vcard-rdf#">

    <rdf:Description rdf:nodeID="neil">
        <vCard:EMAIL rdf:resource="mailto:ndunn at ndunn.com"/>
        <vCard:FN>Neil Dunn</vCard:FN>
        <vCard:PHOTO rdf:resource="http://www.ndunn.com/vcard/face.png"/>
        <vCard:SOURCE rdf:resource="http://www.ndunn.com/2005/10/7/hCard"/>
        <vCard:URL rdf:resource="http://www.ndunn.com"/>
    </rdf:Description>

    <rdf:Description rdf:nodeID="ryan">
        <vCard:EMAIL rdf:resource="mailto:ryan at theryanking.com"/>
        <vCard:FN>Ryan King</vCard:FN>
        <vCard:SOURCE
rdf:resource="http://theryanking.com/blog/contact/#vcard"/>
        <vCard:URL rdf:resource="http://theryanking.com"/>
        <foaf:knows rdf:nodeID="neil"/>
    </rdf:Description>
</rdf:RDF>

--

http://dannyayers.com