xmdp-brainstorming: Difference between revisions

From Microformats Wiki
Jump to navigation Jump to search
Line 32: Line 32:
* Documents with user-generated content are hard to parse, and microformats present particular parsing challenges.
* Documents with user-generated content are hard to parse, and microformats present particular parsing challenges.
** REJECTED. This is a straw man issue.
** REJECTED. This is a straw man issue.
** [[User:Bud|Bud]] 19:44, 13 Jul 2005 (PDT): Tantek needs to supply some justification for why this is a strawman as every developer I have talked to has raised it.  It may be that the solutions described below are sufficient to solve the issue.


''Feel free to add issues here.  Keep issues in this list in summary form.  Save lengthy discussion and potential solutions for elaboration below.''
''Feel free to add issues here.  Keep issues in this list in summary form.  Save lengthy discussion and potential solutions for elaboration below.''

Revision as of 02:44, 14 July 2005

XMDP Brainstorming

Authors

Add your name here if you make significant contributions to this page and wish to take responsibility for them.

UNDER CONSTRUCTION

NOTE: This page is currently a bit of a mishmash of xmdp-faq , xmdp-issues, and XMDP brainstorming. I'm going to need to spend some time separating all this out. - Tantek Çelik

XMDP brainstorming

Introduction

Tantek Çelik has developed the <a href="http://gmpg.org/xmdp/" title="XHTML Meta-data Profile">XMDP</a> to define extensions to XHTML including rel values, class names, and <meta name> properties and values. Per the XMDP spec, a link to a microformat's XMDP in the profile attribute of head element indicates that that microformat's vocabulary is formally defined in the document. A parser could read the allowed attribute values from the linked XMDP and use their presence in the document to infer that that particular microformat was in use.

Raised Issues

  • Just because a profile value mentioned in a microformat's linked XMDP also appears in the document does not mean that that microformat is in use. Such co-occurrences could be purely by chance.
    • REJECTED. No this does not make sense. By definition, an XMDP profile defines certain properties and values. Any use of such property or value in the document is thus defined by th definition in the XMDP.
  • Currently, the XMDP can only be linked from the profile attribute of the head element. In many instances, authors will not have access to the head element.
    • ACCEPTED. There are two additional proposed ways to link to XMDP profiles
      1. <link rel="profile">, as introduced in the XMDP poster submitted to WWW2005.
      2. <a rel="profile" href>, as similarly discussed.
  • Documents with user-generated content are hard to parse, and microformats present particular parsing challenges.
    • REJECTED. This is a straw man issue.
    • Bud 19:44, 13 Jul 2005 (PDT): Tantek needs to supply some justification for why this is a strawman as every developer I have talked to has raised it. It may be that the solutions described below are sufficient to solve the issue.

Feel free to add issues here. Keep issues in this list in summary form. Save lengthy discussion and potential solutions for elaboration below.

Addressing issues

These are in no particular order, but an issue should appear in the issues list above if it is addressed here.

Linking to the XMDP

There are at least two additional methods under discussion for linking to the XMDP in addition to the current method of using the profile attribute of the head element:

  • Using <link rel="profile" href="link to XMDP"/>. This method can be used now and will be formalized in XHTML 2.
    • A problem with this method is that it requires access to the head element.
  • Using <a rel="profile" href="link to XMDP">powered by microformat xyz</a> in the body of the document.
    • As noted by a number of people, this approach has the added benefit of creating a viral marketing opportunity for the microformats used. For instance, developers could add badges saying they are using microformat xyz as suggested by the example.
    • Blog authoring environments allow you to insert links at will, so this squarely obviates the need to access the head element.

It should be noted that none of these linking solutions addresses the issue of when exactly the microformat is being used in the document. They only indicate that the microformat may be in use. No. that is false. Referencing an XMDP introduces its definitions into the document. Period. Those definitions then take effect for the properties and values defined therein.

Resolving when microformats are actually in use

One solution to this issue is simply to include the <a rel="profile" href="link to XMDP">powered by microformat xyz</a> within the container element for the microformat. The XMDP spec could then specify that when the <a> element is used in this way, it indicates that the microformat is used by the element containing the <a> element.

There are, however, several clear issues with this proposal:

  • Not every microformat has a container element. Consider reltag one of the most widely used microformats.
  • To some extent, using microformats adds to the cost of writing the document. It's like filling in a form just to write your thoughts. Putting <a> elements with each microformat adds unwanted links on top of that.

Parsing microformats

Parsing user-generated content is challenging. Frequently, it does not validate and may not even be well formed. Therefore, microformat discovery mechanisms that depend on documents having even minimal xml properties like well-formedness will often fail. This is true, in particular, of Brian Suda's frequently cited X2V hCard and hCalendar discovery and transformation prototypes which use XSLT.

However, most microformats, which tend to be agnostic about things like exact element type used, typically require that the developer resort to tools like XPATH that assume well-formedness. Mark Pilgrim's example universal feed parser suggests that it may be possible to sanitize user html to an extent that it is suitable for later processing as xml.

From a pragmatic developer perspective, parsing web pages to discover microformats is likely to be an area of much work.