microformats2-parsing-rdf: Difference between revisions

From Microformats Wiki
Jump to navigation Jump to search
(New page: Draft Microformats2-to-RDF mapping Tantek asked me to draft up a Microformats2-to-RDF mapping. It's pretty straightforward to do. This is rough and will need cleaning up. == RDF model ==...)
 
Line 76: Line 76:
== Representing nested microformats ==
== Representing nested microformats ==


This is easy. Each new object becomes a new RDF resource, and there is a pointed going from the parent object to the child object.
This is easy. Each new object becomes a new RDF resource, and there is a relationship going from the parent object to the child object.


<source lang="html4strict">
<source lang="html4strict">

Revision as of 23:48, 19 November 2012

Draft Microformats2-to-RDF mapping

Tantek asked me to draft up a Microformats2-to-RDF mapping. It's pretty straightforward to do. This is rough and will need cleaning up.

RDF model

In the document below, the RDF model is represented using the Notation3 syntax.

All RDF examples presume at least these basic external prefixes:

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd:  <http://www.w3.org/2001/XMLSchema#> .

In addition, we shall use the profile defined on microformats-2 as an RDF URI prefix:

@prefix uf: <http://microformats.org/profile/> .

What is the subject?

In RDF, all data is in the form of subject, predicate, object triples. These map roughly to English statements. Subjects can be either URI references or blank nodes.

Mapping these adequately is problematic.

If the element which has the root class name for a microformats2 object also has an about attribute, that SHOULD be used to set the subject for the microformat.

If the element does not have an about attribute, or the HTML processor does not support parsing about attributes, then the subject for the microformat should be an RDF blank node.

Representing root class names

Root class names map neatly to RDF's type mechanism. In Notation3, this is represented with the shorthand "a".

<div class="h-card">Tom Morris</div>
_:bnode01 a uf:h-card .

This is compatible with the RDF specification, but differs from RDF best practice. As with Java and most programming languages, in RDF, it is a custom to capitalize class names.

Representative hCard parsing

(I'm not sure if Microformats2 has representative hCard as a pattern anymore. If not, disregard this section.)

If the parser is able to infer that a representative hCard is present on the page, one can represent this by explicitly linking the page to the subject an appropriate property URI from a pre-existing ontology.

TODO: either foaf:primarySubjectOf (or whatever), or something from SIOC, or perhaps Dublin Core to go from the document URI to the bnode or subject URI.

Representing properties

Once we have inferred the class name, we need simply declare the properties.

Language

RDF allows three types of literal: plain literals, language literals and typed literals.

Typed literals contain a datatype annotation, language literals contain a language annotation (the same ISO country codes as is used in HTML's lang attribute and XML's xml:lang attribute).

Processors should work out the language tag (if any) of the elements containing microformat properties (using the latest RDFa specification) and emit language-tagged literals for p- prefixed properties. If no language tag is set in the HTML, emit plain literals for all p- prefixed properties.

rdfs:label

It is generally good practice for each resource to have an rdfs:label property. This maps to p-name.

In Notation3 rules:

@forall
{ ?s uf:p-name ?o . } => { ?s rdfs:label ?o . } .

It is arguable that one may wish to then omit the p-name property from RDF representations of Microformats2 objects. The minor cost of extra duplication is outweighed by ensuring faithful representation and the ability to bidirectionally convert from RDF representations and JSON representations of Microformats2 objects.

Representing nested microformats

This is easy. Each new object becomes a new RDF resource, and there is a relationship going from the parent object to the child object.

<div class="h-event">
  <a class="p-name u-url" href="http://indiewebcamp.com/2012">
    IndieWebCamp 2012
  </a>
  from <time class="dt-start">2012-06-30</time> 
  to <time class="dt-end">2012-07-01</time> at 
  <span class="p-location h-card">
    <a class="p-name p-org u-url" href="http://geoloqi.com/">
      Geoloqi
    </a>, 
    <span class="p-street-address">920 SW 3rd Ave. Suite 400</span>, 
    <span class="p-locality">Portland</span>, 
    <abbr class="p-region" title="Oregon">OR</abbr>
  </span>
</div>

In Notation3, this would emit:

_:hevent1 a uf:h-event;
  rdfs:label "IndieWebCamp 2012";
  uf:p-name "IndieWebCamp 2012";
  uf:u-url <http://indiewebcamp.com/2012>;
  uf:dt-start "2012-06-30"^^xsd:date;
  uf:dt-end "2012-07-01"^^xsd:date;
  uf:location [
    a uf:h-card;
    rdfs:label "Geoloqi";
    uf:p-name "Geoloqi";
    uf:p-org "Geoloqi";
    uf:u-url <http://geoloqi.com/>;
    uf:p-street-address "920 SW 3rd Ave. Suite 400";
    uf:p-locality "Portland";
    uf:p-region "Oregon" .
  ].

Mapping

The RDF semantics of an hCard can be declared as an RDF document available from microformats.org. This can be used by RDF-minded parsers to draw inferences. In the case of hCard...

uf:h-card a owl:Class;
  owl:sameAs <http://www.w3.org/2006/vcard/ns#VCard> .

Equivalent properties can also be declared:

uf:p-phone a owl:DatatypeProperty;
  owl:sameAs <http://xmlns.com/foaf/0.1/phone>;
    <http://schema.org/telephone> .