accepted-limitations-of-microformats

From Microformats Wiki
Jump to navigation Jump to search

Introduction

This document attempts to outline certain accepted limitations of Microformats. These limitations are partly why Microformats are so simple and easy to implement. Simplicity, however, comes with trade-offs and thus you should be aware of the types of problems that Microformats are not intended to solve.

Some of the "trade-offs" and issues listed here are not really limitations of microformats, but rather either misunderstandings of how microformats work, of lack of sufficient scoping/parsing rules in specific formats. Regardless, as these are perceived trade-offs and/or issues, the should be addressed with an FAQ or further documentation. Tantek

Microformat Scoping Issue

A strangely contrived example of the Microformats Scoping issue is outlined in the following section. Let's take the following text and try to mark it up using the hCard Microformat:

Janet Seymour and Robert Tripton are available via phone.
Rob is available at his home number: 555-555-5555.
Janet can be reached at her work number: 777-777-7777.

A publisher would want to mark-up Janet and Robert's telephone contact information. Not knowing that Microformats are scope-less, they might take the following approach:

<div class="vcard">
  <span class="fn">Janet Seymour</span> and
  <div class="vcard">
    <span class="fn">Robert Tripton</span> are available via phone.
    Rob is available at his home number: <span class="tel">555-555-5555</span>.
  </div>
  Janet can be reached at her work number: <span class="tel">777-777-7777</span>.
</div>

Unfortunately, the Microformat parser would generate the following two Microformat outputs:

hCard
  fn  -> Janet Seymour
  tel -> 555-555-5555
  tel -> 777-777-7777
hCard
  fn  -> Robert Tripton
  tel -> 555-555-5555

The reason this would happen is because both hCards overlap. Microformats do not have an established mechanism for identifying which properties go with which VCARD. When it parses Janet Seymour, the first telephone number it finds is Robert Tripton's, not Janet Seymour's. Since the parser doesn't know which person 555-555-5555 belongs to, it mistakenly adds it to Janet Seymour's list of phone numbers. This problem happens whenever you have more than one Microformat of the same type that overlap one another.

Keep in mind that the Microformat authors are aware of this limitation, and that the limitation is fine for what the Microformats community is attempting to accomplish - provide a simple mechanism for semantic data markup. Simplicity has its benefits and its drawbacks. RDFa allows you to specify which property goes with which VCARD and solves this issue, at the added cost of extra syntax (whose verbosity will vary depending on the task). A more fundamental tradeoff here is around fragility: RDFa allows great precision and for the free mixing of independently developed extensions. However, this is accomplished by requiring markup to obey the abstract syntax rules of RDFa, which may be hard to grasp. In some contexts, this tradeoff will pay off. Microformats, by contrast, create more work for parser writers (who must for example update their code to track new vocabularies, or evolving consensus about cross-microformat interaction patterns); this is justified on the basis that a handful of parser implementations can be used by everyone, and that it is better to have any extra work done by a handful of specialised experts, than by thousands of HTML publishers.

Microformat Namespacing Issue

The following is a theoretical example, based on work performed for hAudio, that shows how making poor choices of element naming can lead to confusion. There was a discussion on the New Microformats mailing list regarding the re-use of the 'title' property from hCard. Re-use of property names in Microformats is heavily encouraged, but it needs to be done with care:

  1. If somebody narrowly defines the meaning of a property, re-using that property is difficult if not impossible.
  2. The more you re-use a property, the more risk there is of interpretation clashes when Microformats overlap on a web page.

The question of whether or not we should be using title when referring to the title of a song, or the title of an album, was raised several times while discussing hAudio. The definition for the word 'title' in this sense is:

5 a) the distinguishing name of a written, printed, or filmed production 
  b) a similar distinguishing name of a musical composition or a work of art.

However, the hCard authors previously defined title. They re-used the VCARD definition. The VCARD specification defines title as:

"To specify the job title, functional position or function of the object the vCard represents".

This means that any Microformat that desires to use title must use the definition used by VCARD, since that specification was created first and changing it might confuse people who have adopted VCARD. Having a namespace would attempt to fix this problem by causing more complex parsing and confusion by making. "hCard:title" could mean something subtly different from "hAudio:title" - however, since there are no namespaces in Microformats, this confusion has to be avoided by defining audio-title and album-title, or htitle or name. This could all have been avoided by simply properly re-using fn (formatted name) as it has been used in hCard, hReview, hListing and many proposals, and then specify parsing/context rules appropriately.

Let us assume for a moment that "title" was defined so loosely that would have allowed hAudio to re-use the property. There is another problem that goes back to the example given in Appendix A. How do you differentiate between two Microformats that overlap with the same property name? Take this sentence for example:

Freddie Mercury, known for a song called Bohemian Rhapsody, was the lead singer for Queen.

The HTML markup would look like this:

<div class="vcard">
  <span class="fn">Freddie Mercury</span>, 
  known for a song called
  <div class="haudio">
    <span class="title">Bohemian Rhapsody</span>,
  </div>
  was the <span class="title">lead singer</span>
  for <span class="org">Queen</span>.
</div>

The Microformat parser above would generate the following two outputs:

hCard
  fn    -> Freddie Mercury
  title -> Bohemian Rhapsody
  org   -> Queen
hAudio
  title -> Bohemian Rhapsody

Even though title is the desired word to use, due to its ambiguity, hAudio was forced to use a different property name. hAudio currently uses the property name audio-title and album-titleto avoid namespace and scoping issues. The use of the dash character in "audio-title" and "album-title" is a very simple way of creating separate names, avoiding the complexity and ambiguity of XML namespaces

This limitation is also well known to the Microformats authors and is designed to avoid the price of complex namespacing for an easy-to-implement markup mechanism for publishers.