accepted-limitations-of-microformats: Difference between revisions

From Microformats Wiki
Jump to navigation Jump to search
m (Created accepted-limitations-of-microformats page)
 
(noted title should have simply re-used fn to begin with, with more precise scoping/parsing rules. noted caveat for whole document at top.)
 
(5 intermediate revisions by 4 users not shown)
Line 2: Line 2:


This document attempts to outline certain accepted limitations of Microformats. These limitations are partly why Microformats are so simple and easy to implement. Simplicity, however, comes with trade-offs and thus you should be aware of the types of problems that Microformats are not intended to solve.
This document attempts to outline certain accepted limitations of Microformats. These limitations are partly why Microformats are so simple and easy to implement. Simplicity, however, comes with trade-offs and thus you should be aware of the types of problems that Microformats are not intended to solve.
'''Some of the "trade-offs" and issues listed here are not really limitations of microformats, but rather either misunderstandings of how microformats work, of lack of sufficient scoping/parsing rules in specific formats. Regardless, as these are perceived trade-offs and/or issues, the should be addressed with an FAQ or further documentation. [[User:Tantek|Tantek]]'''


== Microformat Scoping Issue ==
== Microformat Scoping Issue ==


A good real-world example of the Microformats Scoping issue is outlined in the following section. Let's take the following text and try to mark it up using the hCard Microformat:
A strangely contrived example of the Microformats Scoping issue is outlined in the following section. Let's take the following text and try to mark it up using the hCard Microformat:


  Janet Seymour and Robert Tripton are available via phone.
  Janet Seymour and Robert Tripton are available via phone.
Line 35: Line 37:
The reason this would happen is because both hCards overlap. Microformats do not have an established mechanism for identifying which properties go with which VCARD. When it parses Janet Seymour, the first telephone number it finds is Robert Tripton's, not Janet Seymour's. Since the parser doesn't know which person 555-555-5555 belongs to, it mistakenly adds it to Janet Seymour's list of phone numbers. This problem happens whenever you have more than one Microformat of the same type that overlap one another.
The reason this would happen is because both hCards overlap. Microformats do not have an established mechanism for identifying which properties go with which VCARD. When it parses Janet Seymour, the first telephone number it finds is Robert Tripton's, not Janet Seymour's. Since the parser doesn't know which person 555-555-5555 belongs to, it mistakenly adds it to Janet Seymour's list of phone numbers. This problem happens whenever you have more than one Microformat of the same type that overlap one another.


Keep in mind that the Microformat authors are aware of this limitation, and that the limitation is fine for what the Microformats community is attempting to accomplish - provide a simple mechanism for semantic data markup. Simplicity has its benefits and its drawbacks. RDFa allows you to specify which property goes with which VCARD and solves this issue, at the added cost of a small amount of extra syntax.
Keep in mind that the Microformat authors are aware of this limitation, and that the limitation is fine for what the Microformats community is attempting to accomplish - provide a simple mechanism for semantic data markup. Simplicity has its benefits and its drawbacks. RDFa allows you to specify which property goes with which VCARD and solves this issue, at the added cost of extra syntax (whose verbosity will vary depending on the task). A more fundamental tradeoff here is around fragility: RDFa allows great precision and for the free mixing of independently developed extensions. However, this is accomplished by requiring markup to obey the abstract syntax rules of RDFa, which may be hard to grasp. In some contexts, this tradeoff will pay off. Microformats, by contrast, create more work for parser writers (who must for example update their code to track new vocabularies, or evolving consensus about cross-microformat interaction patterns); this is justified on the basis that a handful of parser implementations can be used by everyone, and that it is better to have any extra work done by a handful of specialised experts, than by thousands of HTML publishers.


== Microformat Namespacing Issue ==
== Microformat Namespacing Issue ==


The following is a theoretical example, based on work performed for hAudio, that shows how not having namespaces can be detrimental to scalability. There was a [http://microformats.org/discuss/mail/microformats-new/2007-June/000504.html discussion on the New Microformats mailing list] regarding the re-use of the 'title' property from hCard. Re-use of property names in Microformats is heavily encouraged, but it creates several problems:
The following is a theoretical example, based on work performed for hAudio, that shows how making poor choices of element naming can lead to confusion. There was a [http://microformats.org/discuss/mail/microformats-new/2007-June/000504.html discussion on the New Microformats mailing list] regarding the re-use of the 'title' property from hCard. Re-use of property names in Microformats is heavily encouraged, but it needs to be done with care:


# If somebody narrowly defines the meaning of a property, re-using that property is difficult if not impossible.
# If somebody narrowly defines the meaning of a property, re-using that property is difficult if not impossible.
# The more you re-use a property, the more risk there is of property name clashes when Microformats overlap on a web page.
# The more you re-use a property, the more risk there is of interpretation clashes when Microformats overlap on a web page.


The question of whether or not we should be using <i>title</i> when referring to the title of a song, or the title of an album, was raised several times while discussing hAudio. The [http://www.m-w.com/dictionary/title definition for the word 'title'] in this sense is:
The question of whether or not we should be using <i>title</i> when referring to the title of a song, or the title of an album, was raised several times while discussing hAudio. The [http://www.m-w.com/dictionary/title definition for the word 'title'] in this sense is:
Line 49: Line 51:
   b) a similar distinguishing name of a musical composition or a work of art.
   b) a similar distinguishing name of a musical composition or a work of art.


Unfortunately, the hCard authors narrowly defined <i>title</i>. They re-used the VCARD definition. The VCARD specification defines <i>title</i> as:  
However, the hCard authors previously defined <i>title</i>. They re-used the VCARD definition. The VCARD specification defines <i>title</i> as:  


  To specify the job title, functional position or function of the object the vCard represents".
  "To specify the job title, functional position or function of the object the vCard represents".


This means that any Microformat that desires to use <i>title</i> must use the definition used by VCARD, since that specification was created first and changing it might confuse people who have adopted VCARD. Having a namespace would fix this problem as "hCard:title" could mean something subtly different from "hAudio:title" - however, since there are no namespaces in Microformats, this distinction cannot be made.
This means that any Microformat that desires to use <i>title</i> must use the definition used by VCARD, since that specification was created first and changing it might confuse people who have adopted VCARD. Having a namespace would attempt to fix this problem by causing more complex parsing and confusion by making. "hCard:title" could mean something subtly different from "hAudio:title" - however, since there are no namespaces in Microformats, this confusion has to be avoided by defining <i>audio-title</i> and <i>album-title</i>, or <i>htitle</i> or <i>name</i>. This could all have been avoided by simply properly re-using <i>fn</i> (formatted name) as it has been used in [[hCard]], [[hReview]], [[hListing]] and many proposals, and then specify parsing/context rules appropriately.


Let us assume for a moment that "title" was defined in such a way that would have allowed hAudio to re-use the property. There is another problem that goes back to the example given in Appendix A. How do you differentiate between two Microformats that overlap with the same property name? Take this sentence for example:
Let us assume for a moment that "title" was defined so loosely that would have allowed hAudio to re-use the property. There is another problem that goes back to the example given in Appendix A. How do you differentiate between two Microformats that overlap with the same property name? Take this sentence for example:


  Freddie Mercury, known for a song called Bohemian Rhapsody, was the lead singer for Queen.
  Freddie Mercury, known for a song called Bohemian Rhapsody, was the lead singer for Queen.
Line 81: Line 83:
   title -> Bohemian Rhapsody
   title -> Bohemian Rhapsody


Even though title is the desired word to use, due to the scoping issue explained in the previous section, and because Microformats do not have namespaces, we were forced to use a different property name. hAudio currently uses the property name <i>audio-title</i> and <i>album-title</i> in an effort to avoid namespace and scoping issues. Ironically, the use of the dash character in "audio-title" and "album-title" is a very simple form of namespacing... something that Microformats were attempting to avoid.
Even though title is the desired word to use, due to its ambiguity, hAudio was forced to use a different property name. hAudio currently uses the property name <i>audio-title</i> and <i>album-title</i>to avoid namespace and scoping issues. The use of the dash character in "audio-title" and "album-title" is a very simple way of creating separate names, avoiding the complexity and ambiguity of XML namespaces


This limitation is also well known to the Microformats authors and is currently viewed as an acceptable price to pay for an easy-to-implement markup mechanism for publishers. RDFa has namespacing and thus solves this issue, at the added cost of a small amount of extra syntax.
This limitation is also well known to the Microformats authors and is designed to avoid the price of complex namespacing for an easy-to-implement markup mechanism for publishers.

Latest revision as of 23:02, 28 August 2008

Introduction

This document attempts to outline certain accepted limitations of Microformats. These limitations are partly why Microformats are so simple and easy to implement. Simplicity, however, comes with trade-offs and thus you should be aware of the types of problems that Microformats are not intended to solve.

Some of the "trade-offs" and issues listed here are not really limitations of microformats, but rather either misunderstandings of how microformats work, of lack of sufficient scoping/parsing rules in specific formats. Regardless, as these are perceived trade-offs and/or issues, the should be addressed with an FAQ or further documentation. Tantek

Microformat Scoping Issue

A strangely contrived example of the Microformats Scoping issue is outlined in the following section. Let's take the following text and try to mark it up using the hCard Microformat:

Janet Seymour and Robert Tripton are available via phone.
Rob is available at his home number: 555-555-5555.
Janet can be reached at her work number: 777-777-7777.

A publisher would want to mark-up Janet and Robert's telephone contact information. Not knowing that Microformats are scope-less, they might take the following approach:

<div class="vcard">
  <span class="fn">Janet Seymour</span> and
  <div class="vcard">
    <span class="fn">Robert Tripton</span> are available via phone.
    Rob is available at his home number: <span class="tel">555-555-5555</span>.
  </div>
  Janet can be reached at her work number: <span class="tel">777-777-7777</span>.
</div>

Unfortunately, the Microformat parser would generate the following two Microformat outputs:

hCard
  fn  -> Janet Seymour
  tel -> 555-555-5555
  tel -> 777-777-7777
hCard
  fn  -> Robert Tripton
  tel -> 555-555-5555

The reason this would happen is because both hCards overlap. Microformats do not have an established mechanism for identifying which properties go with which VCARD. When it parses Janet Seymour, the first telephone number it finds is Robert Tripton's, not Janet Seymour's. Since the parser doesn't know which person 555-555-5555 belongs to, it mistakenly adds it to Janet Seymour's list of phone numbers. This problem happens whenever you have more than one Microformat of the same type that overlap one another.

Keep in mind that the Microformat authors are aware of this limitation, and that the limitation is fine for what the Microformats community is attempting to accomplish - provide a simple mechanism for semantic data markup. Simplicity has its benefits and its drawbacks. RDFa allows you to specify which property goes with which VCARD and solves this issue, at the added cost of extra syntax (whose verbosity will vary depending on the task). A more fundamental tradeoff here is around fragility: RDFa allows great precision and for the free mixing of independently developed extensions. However, this is accomplished by requiring markup to obey the abstract syntax rules of RDFa, which may be hard to grasp. In some contexts, this tradeoff will pay off. Microformats, by contrast, create more work for parser writers (who must for example update their code to track new vocabularies, or evolving consensus about cross-microformat interaction patterns); this is justified on the basis that a handful of parser implementations can be used by everyone, and that it is better to have any extra work done by a handful of specialised experts, than by thousands of HTML publishers.

Microformat Namespacing Issue

The following is a theoretical example, based on work performed for hAudio, that shows how making poor choices of element naming can lead to confusion. There was a discussion on the New Microformats mailing list regarding the re-use of the 'title' property from hCard. Re-use of property names in Microformats is heavily encouraged, but it needs to be done with care:

  1. If somebody narrowly defines the meaning of a property, re-using that property is difficult if not impossible.
  2. The more you re-use a property, the more risk there is of interpretation clashes when Microformats overlap on a web page.

The question of whether or not we should be using title when referring to the title of a song, or the title of an album, was raised several times while discussing hAudio. The definition for the word 'title' in this sense is:

5 a) the distinguishing name of a written, printed, or filmed production 
  b) a similar distinguishing name of a musical composition or a work of art.

However, the hCard authors previously defined title. They re-used the VCARD definition. The VCARD specification defines title as:

"To specify the job title, functional position or function of the object the vCard represents".

This means that any Microformat that desires to use title must use the definition used by VCARD, since that specification was created first and changing it might confuse people who have adopted VCARD. Having a namespace would attempt to fix this problem by causing more complex parsing and confusion by making. "hCard:title" could mean something subtly different from "hAudio:title" - however, since there are no namespaces in Microformats, this confusion has to be avoided by defining audio-title and album-title, or htitle or name. This could all have been avoided by simply properly re-using fn (formatted name) as it has been used in hCard, hReview, hListing and many proposals, and then specify parsing/context rules appropriately.

Let us assume for a moment that "title" was defined so loosely that would have allowed hAudio to re-use the property. There is another problem that goes back to the example given in Appendix A. How do you differentiate between two Microformats that overlap with the same property name? Take this sentence for example:

Freddie Mercury, known for a song called Bohemian Rhapsody, was the lead singer for Queen.

The HTML markup would look like this:

<div class="vcard">
  <span class="fn">Freddie Mercury</span>, 
  known for a song called
  <div class="haudio">
    <span class="title">Bohemian Rhapsody</span>,
  </div>
  was the <span class="title">lead singer</span>
  for <span class="org">Queen</span>.
</div>

The Microformat parser above would generate the following two outputs:

hCard
  fn    -> Freddie Mercury
  title -> Bohemian Rhapsody
  org   -> Queen
hAudio
  title -> Bohemian Rhapsody

Even though title is the desired word to use, due to its ambiguity, hAudio was forced to use a different property name. hAudio currently uses the property name audio-title and album-titleto avoid namespace and scoping issues. The use of the dash character in "audio-title" and "album-title" is a very simple way of creating separate names, avoiding the complexity and ambiguity of XML namespaces

This limitation is also well known to the Microformats authors and is designed to avoid the price of complex namespacing for an easy-to-implement markup mechanism for publishers.