xmdp-brainstorming
XMDP Brainstorming
This wiki page offers a location to brainstorm methods for discovering microformats.
Authors
Add your name here if you make significant contributions to this page and wish to take responsibility for them.
Introduction
Tantek Çelik has developed the <a href="http://gmpg.org/xmdp/" title="XHTML Meta-data Profile">XMDP</a> to describe the allowed class attribute values for microformats. A link to a microformat's XMDP in the profile attribute of head element indicates that that microformat may be used in the document. A parser could read the allowed attribute values from the linked XMDP and use their presence in the document to infer that that particular microformat was in use.
There are clearly issues with this approach:
- Just because an attribute value mentioned in a microformat's linked XMDP also appears in the document does not mean that that microformat is in use. Such co-occurrences could be purely by chance.
- Currently, the XMDP can only be linked from the profile attribute of the head element. In many instances, authors will not have access to the head element.
- Documents with user-generated content are hard to parse, and microformats present particular parsing challenges.
Feel free to add issues here. Keep issues in this list in summary form. Save lengthy discussion and potential solutions for elaboration below.
Addressing issues
These are in no particular order, but an issue should appear in the issues list above if it is addressed here.
Linking to the XMDP
There are at least two additional methods under discussion for linking to the XMDP in addition to the current method of using the profile attribute of the head element:
- Using <link rel="profile" href="link to XMDP"/>. This method can be used now and will be formalized in XHTML 2.
- A problem with this method is that it requires access to the head element.
- Using <a rel="profile" href="link to XMDP">powered by microformat xyz</a> in the body of the document.
- As noted by a number of people, this approach has the added benefit of creating a viral marketing opportunity for the microformats used. For instance, developers could add badges saying they are using microformat xyz as suggested by the example.
- Blog authoring environments allow you to insert links at will, so this squarely obviates the need to access the head element.
It should be noted that none of these linking solutions addresses the issue of when exactly the microformat is being used in the document. They only indicate that the microformat may be in use.
Resolving when microformats are actually in use
One solution to this issue is simply to include the <a rel="profile" href="link to XMDP">powered by microformat xyz</a> within the container element for the microformat. The XMDP spec could then specify that when the <a> element is used in this way, it indicates that the microformat is used by the element containing the <a> element.
There are, however, several clear issues with this proposal:
- Not every microformat has a container element. Consider reltag one of the most widely used microformats.
- To some extent, using microformats adds to the cost of writing the document. It's like filling in a form just to write your thoughts. Putting <a> elements with each microformat adds unwanted links on top of that.
Parsing microformats
Parsing user-generated content is challenging. Frequently, it does not validate and may not even be well formed. Therefore, microformat discovery mechanisms that depend on documents having even minimal xml properties like well-formedness will often fail. This is true, in particular, of Brian Suda's frequently cited X2V hCard and hCalendar discovery and transformation prototypes which use XSLT.
However, most microformats, which tend to be agnostic about things like exact element type used, typically require that the developer resort to tools like XPATH that assume well-formedness. Mark Pilgrim's example universal feed parser suggests that it may be possible to sanitize user html to an extent that it is suitable for later processing as xml.
From a pragmatic developer perspective, parsing web pages to discover microformats is likely to be an area of much work.