Microformat Object or Microformat Opacity or Microformat Opaque
- Tantek Çelik
Both recent discussions around hAtom, and earlier discussions from June of 2005 have indicated that there may be a need for a generic microformat to indicate that a specific element is a wrapper, container, or layer of abstraction, that should be opaque to something parsing the microformats that may be further up the hierarchy.
E.g. you might put a
<span class="vcard mfo"> deep inside a
<span class="vevent">, and not want the categories/tags of the hCard accidentally parsed into the hCalendar event.
Note: the use of "mfo" is only for the purpose of illustration is by no means a proposed name for this microformat. We expect research/discussion to reveal a much better name. We use "mfo" only as a temporary name for the sake of discussion and example illustration. We may even want to commit to deliberately using a class name different from "mfo" just to make this clear in the end.
Forward Compatibility for Parsers
Part of the point of this is to help with forward compatibility for parsers.
Thus an hCalendar parser might need not know about hCard (even though in practice they probably will). As the number of microformats grows, the chances that a new microformat may confuse an old parser due to the scenario outlined above increases. Thus we are considering making it explicit when a new "root" microformat is established.
- fill out the real world examples below
- create mfo-formats page for researching/describing how other data formats indicate this kind of "abstraction", including the various terms they use like "object", "container", etc.
- create mfo-brainstorming page where we discuss how this should work, and candidate names. Some candidate names that have been offered to date: u, uf, object, container, root, mfo...
Here are some real world examples where folks have encountered the need to explicitly indicate that an embedded microformat does not introduce properties to its container.
Container microformats use context in a similar way to that of conventional XML. When an Atom document includes the element <author> it is context that determines whether the author of a feed or the author of an entry is being specified. However, contrary to convetional XML microformats support forwards compatibility with must-ignore semantics for intervening elements between the context and data. This introduces a problem of identifying contexts that may have been ignored in parsing. If hAtom finds an author element belonging a new microformat that it does not recognise, it may incorrectly summise that the author element belongs to it and refers to it. In fact, it refers to the unknown microformat. Any other inference is invalid.
Elements that have different meanings in different microformats also pose a problem. hCard includes a title element meaning approximately "a person's job title". Atom and various other specifications use title to mean "the title of this document or sub-document". hReview avoided the use of title by re-using "summary" from hCalendar element, however this also clashes with the atom namespace. hReview uses summary to mean "review title", while atom uses summary to mean "abbreviated content, both longer than title and shorter than content".
hAtom currently attempts to resolve both the context problem and the nomenclature problem by explicitly naming child elements as opaque. Currently "content" and "summary" (will likely change) are considered completely opaque, while "author" and "contributor" are only scanned for hCard content. This may be an incomplete solution if hCards or other context microformats are included outside of these nodes.
hAtom and other microformats
One might say that if a parser understands hAtom, then there's no need for explicitly marking opaque elements as opaque.
This is true, for hAtom parsers, and I (Tantek) made the same argument originally for hCard, and hCalendar, and hReview, e.g. if a parser understands hCard, then there's no need for explicitly marking opaque elements as opaque.
However, what happens when an hReview parser, which was written before hAtom was conceived, encounters mixed hReview + hAtom content?
The whole need for marking opaque elements explicitly as opaque is to enable *current/old* microformat parsers to NOT be confused by new microformats which happen to reuse vocabulary.
Another way of looking at this is that by agreeing on a neutral opacity class name, we avoid the need for every microformat parser to have to know about every microformat. I'm sure you can imagine how much of a burden that might become over time.