naming-principles

From Microformats Wiki
Revision as of 22:56, 13 October 2007 by Tantek (talk | contribs) (update principle links)
Jump to navigation Jump to search

Naming Principles

One of the key microformats principles is re-use, and in particular, re-use of names of objects, properties, and values from existing formats and standards when possible.

Introduction

One of the key microformats principles is re-use, and in particular, re-use of names of objects, properties, and values from existing formats and standards when possible. -Tantek

I explicitly created this principle in response to the anti-patterns that I saw in many (most?) existing standards efforts such as:

  • Making up names from thin air
  • Ignoring all earlier work
  • Actual hostility towards using names/terms from other standards
  • Using others' names to mean different things
  • Using new names to mean the same thing
  • Endlessly debating and "name-smithing" in order to come up with a slightly more perfect name

Perhaps it is human nature to want to create new names, or name new things. Certainly there is some amount of ego involved in the creation of a new thing which you can then claim to have invented or named. Some of these tendencies are also a form of "Not Invented Here" (NIH) syndrome which unfortunately is quite common among software engineers.

Unfortunately such desire for novelty is bad for standards, and certainly bad for interoperability, which depends on being able to depend on the same name meaning the same thing. It's also bad for language and communication among humans. Even though humans can deal with some ambiguity and overloading of terms (using context to disambiguate), it's easier for humans as well when there is less ambiguity and less overloading.

We're not going to be able to fully eliminate such "Tower of Babel" tendencies, but at least we can minimize them, especially when they are bad for standards and interoperability.

With the experience of developing new microformats such as xFolk, hReview, and hAtom, it has become quite clear that we need to explicitly document some of the specific design principles that went into naming the objects and properties of some of the early established microformats like hCard and hCalendar, and that's the purpose of this document.


Author

Naming Principles

Semantic XHTML Design Principles

First, it is important to note the naming principles which have been defined and explicitly referenced in (most of) the above-mentioned microformats.

Note: the Semantic XHTML Design Principles were written primarily within the context of developing hCard and hCalendar, thus it may be easier to understand these principles in the context of the hCard design methodology (i.e. read that first). Tantek

XHTML is built on XML, and thus XHTML based formats can be used not only for convenient display presentation, but also for general purpose data exchange. In many ways, XHTML based formats exemplify the best of both HTML and XML worlds. However, when building XHTML based formats, it helps to have a guiding set of principles.

  1. Reuse the schema (names, objects, properties, values, types, hierarchies, constraints) as much as possible from pre-existing, established, well-supported standards by reference. Avoid restating constraints expressed in the source standard. Informative mentions are ok.
    1. For types with multiple components, use nested elements with class names equivalent to the names of the components.
    2. Plural components are made singular, and thus multiple nested elements are used to represent multiple text values that are comma-delimited.
  2. Use the most accurately precise semantic XHTML building block for each object etc.
  3. Otherwise use a generic structural element (e.g. <span> or <div>), or the appropriate contextual element (e.g. an <li> inside a <ul> or <ol>).
  4. Use class names based on names from the original schema, unless the semantic XHTML building block precisely represents that part of the original schema. If names in the source schema are case-insensitive, then use an all lowercase equivalent. Components names implicit in prose (rather than explicit in the defined schema) should also use lowercase equivalents for ease of use. Spaces in component names become dash '-' characters.
  5. Finally, if the format of the data according to the original schema is too long and/or not human-friendly, use <abbr> instead of a generic structural element, and place the literal data into the 'title' attribute (where abbr expansions go), and the more brief and human readable equivalent into the element itself. Further informative explanation of this use of <abbr>: Human vs. ISO8601 dates problem solved

Some Details

  • dash-separated-lowercase-words. W3C CSS (cascading style sheets) introduced the convention of lowercasing all property/value names (identifiers) and separating words with dash "-" characters for reasons of better human readability as compared to other approaches like CamelCase (or even camelCase). Microformats property names strictly adopt this approach as well.

Unique Root Class Names

I've also written a bit about the design principles that went into the *root* class names (which require a bit different treatment than property class names) in the microformats, but this is described in the hcard-parsing page currently:

http://microformats.org/wiki/hcard-parsing#root_class_name

Need to copy some of that text here and make it not-hCard specific.

Minimal Vocabulary

This is one of two additional key principles that I think I need to outline in more detail. The principle of "minimal vocabulary" is actually directly derived from the principle of start as simple as possible.

  • minimal vocabulary. We try to introduce as few new microformat terms as possible.

Reuse Microformats First, Other Standards Second

This is actually outlined quite clearly in the microformats principles, but deserves both explicit repeating here with strong emphasis:

The key here is that this principle is not only about reusing whole microformats (e.g. don't invent a new person property for your microformat, just reuse hCard), but also about where to get names for properties.

In particular, if you find that your new microformat has a property which means the same thing as an exsiting microformat, you SHOULD (maybe I should make this a MUST) reuse the class name from that existing microformat. This practice also follows the principle of minimal vocabulary, and of re-using the same name to mean the same thing (instead of using two names to mean the same thing).

For Other Standards, Prefer Older to Newer

If there is no microformat name for a property, and we are reusing names based upon research of existing formats, then often there is more than one format with more than one name for the particular concept.

Often times new standards are developed which (most often) needlessly rename names from older standards. Thus to repair such naming drift, all other things being equal (e.g. both standards have been widely interoperably implemented), we prefer the older name over the newer name.

Examples of Following the Naming Principles

We've followed these naming principles from the start, and made changes to microformats in development as a result. For example, xFolk was changed from v0.4 to v1RC. xFolk dropped the new class name "extended" in preference for re-using the existing "description" class name. See Changes since xFolk 0.4 for details.

Naming Patterns Under Consideration as Principles

A few patterns have arisen in the naming of class names for microformats, and while these patterns are not conventions (yet), it may be worth considering them.

dt properties

So far, all datetime class names start with "dt", and all class names that start with "dt" are ISO8601 datetime properties. E.g.

Note that "dt" is also under consideration for type XOXO.

Undefined: dtstamp - hCalendar

exceptions to dt prefix

However, some proposed/underdevelopment microformats currently have class names for datetime properties without the "dt" prefix:

Draft:

Proposed:

h word

So far, all uses of a single "h" prefix in a property name apply to (potential) root elements. But not all (potential) root elements start with "h" (which is ok).

E.g.:

Should we enforce the rule that only (potential) root elements may begin with an "h" prefix?

Non-h-prefixed root elements:

Anti-Patterns

Here are things not to do when creating names

Namespaces

Avoid namespaces (i.e. class names of microformat-key]); read namespaces-considered-harmful. hAtom uses a limited amount of namespacing to exactly reuse a particular semantic from the Atom spec.

Issues

  • Shouldn't brevity be a consideration? Not to the point of loosing meanings (class="b") but to prevent needless verbosity (class="thing-that-we-have-no-short-name-for"). More prgmatically, abreviations should be meaningful and appropriate (e.g. var for variety as used in botanical naming) - Andy Mabbett

See Also