misconceptions

From Microformats Wiki
Revision as of 07:38, 25 November 2007 by Tantek (talk | contribs) (added schema would improve interoperability of microformats misconception and schema incompleteness problem)
Jump to navigation Jump to search

misconceptions about microformats

Some misconceptions either appear often enough, or may have been held by people that are experts in the fields of markup, web standards etc., and thus it is useful to document and debunk them.

misconceptions

microformats use unscoped class names

misconception: Microformats definitions use unqualified/scoped class attribute strings as semantic tags.

This is incorrect on two counts.

First, microformats make use of profile URLs to define class name semantics. Thus it is entirely possible (though both unlikely, and undesirable) for someone to redefine class names with their own definitions in their own profile URL.

Second, compound microformats (e.g. hCard, hCalendar, hReview) use a fairly uniquely chosen string for the "root element" (e.g. vcard, vcalendar, vevent, hreview) and then generic terms only inside that root element. Thus use of any generic terms are scoped (some might even say "namespaced") within the fairly uniquely chosen "root element" class name.

microformats use non URI based extensibility

misconception: Microformats use non-URI based extensibility.

Microformats make use of the profile attribute, in the <head> element to reference one or more profiles (this is all per HTML4 spec) to an XMDP profile document (XMDP is derived from the "hints" in HTML4 as to what a profile document "could" be), to define specific rel (e.g. XFN 1.1 profile and class (e.g. hCard profile) values.

Thus microformats are built upon a form of URI based extensibility. Tantek did this by design for XFN, his first experiment into formally extending HTML, and before he even coined the word "microformat" (XMDP & XFN were developed in 2003, "microformats" were first proposed in 2004).

What we have found is that, just like HTML was often used in the wild without explicit DOCTYPE URIs (and tools e.g. browsers supported it), microformats are often used in the wild without explicit profile URLs (and tools e.g. browser plugins support it).

Missing a DOCTYPE does little or no damage today, as (modulo tag soup issues) the DOCTYPE is a link in the chain of reasoning about what the document means. It's been asserted that the HTML profile for microformats is however a crucial link, which perhaps similar to the assertion made by the SGML community back when HTML was introduced that the DOCTYPE for HTML is a crucial link. The parallels are nearly identical.

However, despite how browsers make good sense of HTML sans DOCTYPEs today, witness how nearly no general user-centric user agents have been built to make sense of the babel of XML sans DOCTYPEs that is being published. Given the failure of XML use in practice to make use of URI based extensibility, and the subsequent failure for there to be any widespread user-centric user agents (e.g. browsers) that make use of that content, the lesson to learn here is that it is therefore important to use the profile attribute for microformats, and encourage its use.

The XMDP spec and the GRDDL spec show how to make a profile, and how generic data clients to follow, to either ground the data into RDF, or use the data directly as microformats with terms defined by their XMDP+ID URIs. This will maximize re-use of the data, in combination wit other data. There is a growing class of grddl-aware systems which will use GRDDL-enabled microformat data without any alteration.

microformats tools will erroneously pick up data

One danger of omitting profiles is that, because tools such as browser plugins support microformats without checking for a profile, then those tools will erroneously pick up data from pages which use classes for a completely unrelated purpose. This attributes to the author information which they never meant to give. '

This scenario is highly unlikely and has yet to occur in the real web due to the fairly uniquely chosen root class names for microformats which tools look for before they look for the more generically named classes inside microformats.

no generic data gathering device can be built

The other danger of omitting profiles is that no generic data-gathering device can be built. The web ceases to be self-describing, in that there then would be no one common algorithm for deriving the data from a given page.'

It is difficult to disprove a negative statement as the first sentence, and, this is strictly a theoretical problem, as no generic data-gathering device has been built.

Similarly, the web is not self-describing currently with widespread use of HTML, tag soup, invalid XML, etc. thus saying it ceases to be self-describing is misleading, because there is no "ceasing". If anything, by increasing the semantics expressed on the existing web, the use of microformats increases how much each page self-describes its own semantics.

do not scale when domain specific microformats are added

microformats do not scale when domain-specific, or culture-specific, or company-specific microformats are added.

Microformats are not trying to solve all problems, in fact, that is a specific non-goal. See the microformats principles. In practice, one does not have to solve all problems, nor even make it possible to solve all problems.

Microformats are trying to represent the 80/20 of semantics on the public web, and thus solve most problems that will actually help most people on the public web.

For domain-specific, or culture-specific, or company-specific semantics, those authors should simply make use of the best POSH that they can, with their own profiles and profile URLs, and if their domain-specific, or culture-specific, or company-specific semantics become widely adopted on the web, then that may provide a good case for taking their POSH through the microformats process to develop a new microformat.


current behaviors and usage patterns in general

More there once folks have made the overgeneralization that microformats are/adapted to "current behaviors and usage patterns" in general, and then use that overgeneralization to justify:

  • current behaviors of user agents
  • current browser usage patterns
  • other generalizations

as design centers for microformats. Unfortunately this is incorrect and dilutes the focus of microformats.

A quote taken out of context such as the following has been used to justify the overgeneralization: From the microformats about page:

microformats are: [...] adapted to current behaviors and usage patterns

This is precisely demonstrates the "taking out of context" logical flaw. If you read the entire quote:

Designed for humans first and machines second, microformats are a set of simple, open data formats built upon existing and widely adopted standards. Instead of throwing away what works today, microformats intend to solve simpler problems first by adapting to current behaviors and usage patterns (e.g. XHTML, blogging).

You can see that the "current behaviors and usage patterns" specifically applies to *markup*, *content*, and *publishing* "(e.g. XHTML, blogging)".

Microformats adapt to current content publishing behaviors and markup usage patterns. Not current behaviors and usage patterns in general. More on this is clear from the principles and the process.

This is actually a very important distinction, as focusing on the content publishing side of things is one the ways microformats greatly succeeds. By focusing on making things easier for publishers (rather than developers, parsers, browser vendors, etc.), microformats lowers barriers for the most people. Specifically, lowering barriers for publishers to publish semantic content (from POSH on up) helps solve the chicken-egg problem that such content often suffers from, as publishers will often do something if it at least has some benefit, if it is very easy to do so.

schema would improve interoperability of microformats

Once in a while someone will propose a more formal schema to define a microformat (often a typed schema), with the assertion that a schema would improve interoperability of microformats.

There are several problems with this assertion that are best described with questions:

What is the real world interoperability problem that you are trying to solve?

Do you have test cases that have been demonstrated to fail in specific implementations?

Do you have analysis that demonstrates that such problems stem from a lack of an explicit typed schema?

Lacking that, it is not logical to conclude that a schema would help improve interoperability.

schema incompleteness problem

Somewhat related to the schema interoperability misconception noted above, advocates of schemas in general often make the implicit assumption that if they have a schema for something, it somehow means they've fully expressed that something, potentially in a way that can be "automatically" parsed, etc.

In practice, explicit schemas do not represent all (often not even most) of the semantics of a specific format. For example, the HTML4 DTDs contain a mere fraction of the constraints and semantics expressed by the HTML4 specification. A validator that only checks the rules expressed in the HTML DTD will fail to check numerous assertions and requirements made in the specification itself. This is the schema incompleteness problem. In short, having a set of rules from a framework (such as those expressed by a schema like a DTD) is not only in practice insufficient, but serves to give a false sense of completeness of description.

Thus with microformats we eschew trying to solve the general schema problem (others are trying much harder for much longer on that problem - e.g. XML Schema etc., and failing in practice - i.e. usage on the Web) for simple dictionaries instead.

There has been some value demonstrated in some scenarios (e.g. reading microformats into an RDF store, either directly or thru a GRDDL transform) to at least disambiguate the use of vocabulary, and back the terms used with URLs. Thus we have XMDP (XHTML Meta Data Profiles) which is sufficient to define terms and provide a URL for each.

see also