Microformat FAQs relating to RDF

What are Microformats

What are Microformats?

Microformats are a set of simple, open data formats built upon existing and widely adopted standards, in particular semantic (X)HTML. The processes, principles, and practices of the microformats open standards community are what make microformats "microformats", but they center on using HTML and XHTML as designed, as a semantic language (though they can and have been implemented in other XML formats, e.g. RSS, Atom). See what-are-microformats for more.

One of the microformats principles is humans first, machines second. The more human friendly syntax of microformats can be converted to RDF via a number of mechanisms:

microformats that have XMDP profiles can simply be directly used with RDF-compatible URIs for microformats terminology provided by those profiles. See profile-uris.
With the help of the GRDDL mechanism, it is possible to view microformats as domain-specific RDF serializations.

Why were microformats proposed when RDF already existed

Why were microformats proposed when RDF already existed to markup semantic information?

When Tantek first proposed microformats in 2004 with Kevin Marks ([1] see history for more), it was in many ways a proof of concept that the neither the RDF data model nor (and especially) syntax(es) had significant advantages as a way of modeling (and especially authoring) data over and above simple markup that perhaps mainstream to modern web designers were already broadly familiar with. And not just a lack of advantages, but that frankly, perhaps simpler, more real world solutions were possible.

How real are microformats

But how real are microformats?

In just five years after that initial proposal, there is already solid support for several microformats in numerous sites from as large as AOL, Google, Yahoo, Yelp, to a very long tail of smaller sites - at last count Yahoo Searchmonkey reported over 1.3 billion hCards on the web. What started as a proof of concept has become the dominant form of representing semantics (over and above what's built into HTML) on the web.

But I do RDF, why should I be interested?

Microformats lower the barrier to publishing data on the Web. This is entirely in line with the high level goals of the Semantic Web.

Dan Connolly rdf-interest, March 2000:

I believe that one of the best ways to transition into RDF, if not a long-term deployment strategy for RDF, is to manage the information in human-consumable form (XHTML) annotated with just enough info to extract the RDF statements that the human info is intended to convey. In other words: using a relational database or some sort of native RDF data store, and spitting out HTML dynamically, is a lot of infrastructure to operate and probably not worth it for lots of interesting cases. We all know that we have to produce a human-readable version of the thing… why not use that as the primary source?

I have an RDF vocabulary I would like to use as a microformat. How do I do it?

Before doing anything else, read the microformats process.

In general the microformat process is empirical-data-driven. It starts with material already being published (and implied schema therein), rather than an existing format, model or explicit schema.

Check the list of what has already been covered and the work-in-progress on the Wiki Main_Page.

It may be that there is already a microformat for the data you want to represent.

It may well be that what you have in mind isn't appropriate for use as a microformat, but it may still be a good idea to develop a (semantic) XHTML representation. Existing microformats demonstrate a standards-friendly Plain Old Semantic HTML (POSH) way of doing this.

So, this is about using CSS class values to add semantics?

No. XHTML already expresses semantics, the HTML class attribute is just one of several mechanisms. From the HTML 4 spec:

The class attribute, on the other hand, assigns one or more class names to an element; the element may be said to belong to these classes.

What about namespaces for the attributes, should I use "xxx:term"?

In general, microformats rejects the use of explicit namespace prefixes in documents as unnecessary for solving the 80/20 of problems that microformats seeks to solve. The general approach taken is not to attempt to generalize to the extent of RDF-in-HTML, rather to define more domain-specific formats.

But won't there be naming clashes?

The social aspect of the microformats process is such that conflicts ought to be prevented. The goal is to keep things as simple as possible by only focusing on existing well-defined problems, rather than trying to "boil the ocean" (solve the hypothetical general case).

In addition, XMDP profiles can be used to explicitly define terms used in microformats.

So how do I get the data out?

See GRDDL (older version)

What is GRDDL?

See grddl.

Isn't there a clash between the semantics of XFN and FOAF?

The use of the page URI in XFN to identify a person appears to conflict with FOAF's by-reference approach, and to mask the potential for saying things about the page itself. However in practice this isn't a problem. It's possible to parse the document as XFN (using e.g. grokXFN.xsl) to extract the person-related statements, e.g.

_:personA foaf:homepage <http://example.org/this-page> .
_:personA foaf:knows _:personB .
_:personB foaf:homepage <http://example.org/linked-page> .

- and independently parse the document using other format mappings (e.g. dc-extract.xsl) to obtain other statements, e.g.

<http://example.org/this-page> dc:creator "The Creator" .

What other work has been done with microformats and RDF?

Are there Schemas for Microformats?

Kind of. The primary specification is XHTML, but HTML4 provides a mechanism (the 'profile' attribute of the <head> element) to point to a meta data profile that defines properties and values. There is a (HTML-based) format specified for microformat profiles - XHTML Meta Data Profiles. Note that XMDP's URLs for specifying terms is compatible with those used by RDF, with "#term" at the end.

See XMDP for more.

What RDF vocabularies (and XSLT) corresponding to microformats is available?

See MicroModels (on ESW Wiki)

Isn't this just scraping?

No. microformats specify parsing therefore do not require or suggest scraping.

microformats define parsing rules for well defined markup (e.g. hcard-parsing - applicable to most microformats).

"scraping", on the otherhand, is the use of site-specific regular expressions that ignore markup.

Also microformats can include URI(s) for every profile used, and the profiles are clearly defined, the explicit data contained in a document can be extracted deterministically by parsing.

Who else is looking at RDF and microformats?

Dan Connolly
Ian Davis
John Breslin
Danny Ayers
Toby Inkster
Martin McEvoy
... please add yourself if you are as well!

How do I get involved?

If you're using the Web, you already *are* involved! Next place to go is the microformats.org site, and maybe sign up to some of the mailing lists (in particular microformats-discuss). There's also an IRC channel #microformats on irc.freenode.net.