Microformat Object or Microformat Opacity or Microformat Opaque
- Tantek Çelik
Both recent discussions around hAtom, and earlier discussions from June of 2005 have indicated that there may be a need for a generic microformat to indicate that a specific element is a wrapper, container, or layer of abstraction, that should be opaque to something parsing the microformats that may be further up the hierarchy.
E.g. you might put a
<span class="vcard mfo"> deep inside a
<span class="vevent">, and not want the categories/tags of the hCard accidentally parsed into the hCalendar event.
Note: the use of "mfo" is only for the purpose of illustration is by no means a proposed name for this microformat. We expect research/discussion to reveal a much better name. We use "mfo" only as a temporary name for the sake of discussion and example illustration. We may even want to commit to deliberately using a class name different from "mfo" just to make this clear in the end.
Forward Compatibility for Parsers
Part of the point of this is to help with forward compatibility for parsers.
Thus an hCalendar parser might need not know about hCard (even though in practice they probably will). As the number of microformats grows, the chances that a new microformat may confuse an old parser due to the scenario outlined above increases. Thus we are considering making it explicit when a new "root" microformat is established.
- fill out the real world examples below
- create mfo-formats page for researching/describing how other data formats indicate this kind of "abstraction", including the various terms they use like "object", "container", etc.
- create mfo-brainstorming page where we discuss how this should work, and candidate names. Some candidate names that have been offered to date: u, uf, object, container, root, mfo...
Here are some real world examples where folks have encountered the need to explicitly indicate that an embedded microformat does not introduce properties to its container.
hCard in hCalendar
[PlayingHere.com] includes thousands of hCalendar events with embedded hCard contacts. The contacts have URLs, the events do not, and these contact URLs are treated as event URLs, which is not what the author intended. An example of the markup:
<body class="vevent"> [...] <h1 class="vcard organizer"><span class="summary">June 19th, 2007 at <span class="fn org">Varsity Theatre</span></span></h1> [...] <h3> <span class="attendee vcard"><a href="http://playinghere.com/band/armyofme/" class="fn url">ARMY OF ME</a></span> at <abbr class="dtstart" title="2007-06-19T17:00:00-07:00">5:00 pm</abbr> </h3> [...] </body>
[Eventful] includes tens of thousands (hundreds of thousands?) of hCalendar events with embedded hCard contacts. The contacts and events both have URLs. Because the contact URLs come first, they take precedence over the event URLs (at least in X2V) when the event data is exported. As a result, the real event URL is replaced with a less useful URL, which is not what the author intended. An example of the markup:
<div class="vevent"> [...] <h1 class="summary">Brown Bag Music Series</h1> [...] <h2>When</h2> <p> <abbr class="dtstart" title="20070619T120000">Tuesday, June 19, 2007 12:00 PM</abbr> </p> [...] <h2>Where</h2> <div class="location vcard"> <a rel="bookmark" class="url uid fn org" href="/venues/V0-001-000395941-4">Denver Public Library - Central Branch</a> [...] </div> [...] <a class="url" href="/r/http://www.denver365.com/index.php?app=eventDetail&id=73358">Event details at www.denver365.com</a> [...] </div>
hCard in hReview
[BBC Music] includes thousands of hCard contacts embedded within hReview reviews. The contacts have URLs, but the reviews do not. These contact URLs are interpretted as review URLs, which is not what the publisher intended. [See email discussion]. An example of the markup:
<div class="hreview"> [...] <dl> <dt>Artist:</dt> <dd><span class="vcard"> <a href="/music/artist/m6qv/" class="fn url">The Beatles</a> </span></dd> [...] </dl> [...] </div>
Container microformats use context in a similar way to that of conventional XML. When an Atom document includes the element <author> it is context that determines whether the author of a feed or the author of an entry is being specified. However, contrary to convetional XML microformats support forwards compatibility with must-ignore semantics for intervening elements between the context and data. This introduces a problem of identifying contexts that may have been ignored in parsing. If hAtom finds an author element belonging a new microformat that it does not recognise, it may incorrectly summise that the author element belongs to it and refers to it. In fact, it refers to the unknown microformat. Any other inference is invalid.
Elements that have different meanings in different microformats also pose a problem. hCard includes a title element meaning approximately "a person's job title". Atom and various other specifications use title to mean "the title of this document or sub-document". hReview avoided the use of title by re-using "summary" from hCalendar element, however this also clashes with the atom namespace. hReview uses summary to mean "review title", while atom uses summary to mean "abbreviated content, both longer than title and shorter than content".
hAtom currently attempts to resolve both the context problem and the nomenclature problem by explicitly naming child elements as opaque. Currently "content" and "summary" (will likely change) are considered completely opaque, while "author" and "contributor" are only scanned for hCard content. This may be an incomplete solution if hCards or other context microformats are included outside of these nodes.
hAtom and other microformats
One might say that if a parser understands hAtom, then there's no need for explicitly marking opaque elements as opaque.
This is true, for hAtom parsers, and I (Tantek) made the same argument originally for hCard, and hCalendar, and hReview, e.g. if a parser understands hCard, then there's no need for explicitly marking opaque elements as opaque.
However, what happens when an hReview parser, which was written before hAtom was conceived, encounters mixed hReview + hAtom content?
The whole need for marking opaque elements explicitly as opaque is to enable *current/old* microformat parsers to NOT be confused by new microformats which happen to reuse vocabulary.
Another way of looking at this is that by agreeing on a neutral opacity class name, we avoid the need for every microformat parser to have to know about every microformat. I'm sure you can imagine how much of a burden that might become over time.
hAtom and hReview - an example of overlay
It could be said that certain microformats, e.g. hAtom, hReview, xFolk, can be "overlayed" (different from "composited").
In particular you can do do this:
<div class="hatom hreview"> ... </div>
and have the expected right thing happen.
In fact, hReview is a perfect example of a microformat that sometimes you will want to make opaque, and sometimes you will want to overlay. Or perhaps you will want to overlay *most* of an hReview, and mark part of it (such as the "reviewer") as opaque.
Thus we actually cannot assume an hReview either whole or in part is either opaque or transparent. We actually need another "bit" of information to indicate that aspect of opacity on the hReview as a whole, or in parts of it..
Similar or Related Problems
Just as using the class name "mfo" could be used to identify the existence of an arbitrary "microformat object" which by its very nature of being an object, should be opaque to any containing microformat (as described above), it may also be worth considering the use of a general purpose class name like "mfp" to indicate that an element represents a microformat property. This would allow auto-recognition of properties of an arbitrary new microformat without having to read the profile for that microformat explicitly.
Since there are type specific markup and parsing interpretations, there may need to be several such classes to indicate types of microformat properties that require special parsing (see hcard-parsing for details for hCard in particular). E.g.: datetime (must prefer 'title' attribute to element contents), url (must prefer href/data/src attributes on a/object/img elements over element contents), email (similar to URL but "mailto:" is pruned). Some questions
- Would there simply be separate class names like "mfdt", "mfu", "mfe"?
- How useful would this be without knowing more about the specific microformat?
- Is that set of types fixed or would we need to add more in the future?
- Do we thus reserve all class names that start with "mf"?
- Is this walking too much down the path of coming up with answers to generic / general-purpose questions which violate our microformats principles?
- Is this a solution looking for a problem?