- Tantek Çelik
- 1 Introduction
- 2 Naming Principles
- 3 Examples of Following the Naming Principles
- 4 Naming Patterns Under Consideration as Principles
- 5 Anti-Patterns
- 6 Referrers
- 7 See Also
One of the key microformats principles is re-use, and in particular, re-use of names of objects, properties, and values from existing formats and standards when possible. -Tantek
I explicitly created this principle in response to the anti-patterns that I saw in many (most?) existing standards efforts such as:
- Making up names from thin air
- Ignoring all earlier work
- Actual hostility towards using names/terms from other standards
- Using others' names to mean different things
- Using new names to mean the same thing (often in a mistaken effort to re-use semantics but rename vocabulary to something "more understandable".)
- Endlessly debating and "name-smithing" in order to come up with a slightly more perfect name
Perhaps it is human nature to want to create new names, or name new things. Certainly there is some amount of ego involved in the creation of a new thing which you can then claim to have invented or named. Some of these tendencies are also a form of "Not Invented Here" (NIH) syndrome which unfortunately is quite common among software engineers.
novelty hurts interoperability
Unfortunately such desire for novelty is bad for standards, and certainly bad for interoperability, which depends on being able to depend on the same name meaning the same thing.
novelty hurts communication
It's also bad for language and communication among humans (e.g. see the Anglo-centric renaming anti-pattern). Even though humans can deal with some ambiguity and overloading of terms (using context to disambiguate), it's easier for humans as well when there is less ambiguity and less overloading.
documenting principles helps
We're not going to be able to fully eliminate such "Tower of Babel" tendencies, but at least we can minimize them, especially when they are bad for standards and interoperability.
With the experience of developing new microformats such as xFolk, hReview, and hAtom, it has become quite clear that we need to explicitly document some of the specific design principles that went into naming the objects and properties of some of the early established microformats like hCard, hCalendar, and hReview 0.4 (in progress) and that's the purpose of this document.
Semantic XHTML Design Principles
First, it is important to note the naming principles which have been defined and explicitly referenced in (most of) the above-mentioned microformats.
Note: the Semantic XHTML Design Principles were written primarily within the context of developing hCard and hCalendar, thus it may be easier to understand these principles in the context of the hCard design methodology (i.e. read that first). Tantek
XHTML is built on XML, and thus XHTML based formats can be used not only for convenient display presentation, but also for general purpose data exchange. In many ways, XHTML based formats exemplify the best of both HTML and XML worlds. However, when building XHTML based formats, it helps to have a guiding set of principles.
- Reuse the schema (names, objects, properties, values, types, hierarchies, constraints) as much as possible from pre-existing, established, well-supported standards by reference. Avoid restating constraints expressed in the source standard. Informative mentions are ok.
- For types with multiple components, use nested elements with class names equivalent to the names of the components.
- Plural components are made singular, and thus multiple nested elements are used to represent multiple text values that are comma-delimited.
- Use the most accurately precise semantic XHTML building block for each object etc.
- Otherwise use a generic structural element (e.g.
<div>), or the appropriate contextual element (e.g. an
- Use class names based on names from the original schema, unless the semantic XHTML building block precisely represents that part of the original schema. If names in the source schema are case-insensitive, then use an all lowercase equivalent. Components names implicit in prose (rather than explicit in the defined schema) should also use lowercase equivalents for ease of use. Spaces in component names become dash '-' characters.
- Finally, if the format of the data according to the original schema is too long and/or not human-friendly, use
<abbr>instead of a generic structural element, and place the literal data into the 'title' attribute (where abbr expansions go), and the more brief and human readable equivalent into the element itself. Further informative explanation of this use of
<abbr>: Human vs. ISO8601 dates problem solved
- hyphen-separated-lowercase-words. W3C CSS (cascading style sheets) introduced the convention of lowercasing all property/value names (identifiers) and separating words with hyphen"-" characters for reasons of better human readability as compared to other approaches like CamelCase (or even camelCase). Microformats property names strictly adopt this approach as well.
When reusing names from another vocabulary/schema/RFC:
drop redundant or no-value suffixes. Similar to "Plural components are made singular" note above, the point of this principle is to drop suffixes that don't add anything to the term, and thus use a solve simpler problems first term when possible.
The specific example in mind is 'country-name' which comes from prose description in vCard. The suffix '-name' seems redundant or rather is an implied default meaning of the term 'country' (as opposed to say, 'country-code', e.g. Olympics). Just as we changed vCard 'categories' to 'category', it makes sense to change "country name" to just 'country'.
- other specs
- PoCo uses just 'country'
- W3C WD-contacts-api uses 'country'
- see Current Contact Formats for related research.
- the '-name' suffix does provide semantic clarity that the country name as it would be displayed is provided rather than the country code, or shape/polygon or some other aspect of the country. (though perhaps in practice this is not a problem, as 'org' has been sufficient and nearly no one uses 'organization-name' vs. 'organization-unit', and we can always add suffixes later for more details if necessary).
- 'country' already implies ISO country code in some contexts. In particular, in forms (URL to example(s)?) in which the user had to select "Country" and it was just the ISO Country Code that was stored and displayed to the user. (mkowens on IRC)
- I believe this is sufficient to keep 'country-name' by demonstrating that the '-name' suffix is not redundant and does add semantic value. Thus we would need another real world property name example to test this potential principle before adopting it. - Tantek 18:44, 24 July 2012 (UTC)
- 'country' may be ambiguous (name? code?) and thus it is better to keep a specific term 'country-name' to avoid that confusion.
- other specs
- vCard in RDF re-used 'country-name' from hCard
- OGP re-used 'country-name' from hCard
Unique Root Class Names
I've also written a bit about the design principles that went into the *root* class names (which require a bit different treatment than property class names) in the microformats, but this is described in the hcard-parsing page currently:
Need to copy some of that text here and make it not-hCard specific.
Use as few terms as possible, and in particular use as few new terms as possible. The principles of "minimal vocabulary" is actually directly derived from the principle of start as simple as possible.
- minimal vocabulary. We try to introduce as few new microformat terms as possible. See minimal vocabulary for more detail and reasons.
Reuse microformats first, other standards second.
This is actually outlined quite clearly in the microformats principles, but deserves both explicit repeating here with strong emphasis:
- reuse building blocks from widely adopted standards
- semantic (http://tantek.com/presentations/20040928sdforumws/semantic-xhtml.html), meaningful (X)HTML (http://tantek.com/presentations/2005/03/elementsofxhtml). See semantic XHTML design principles above for more details.
- existing microformats
- well established schemas from interoperable RFCs
The key here is that this principle is not only about reusing whole microformats (e.g. don't invent a new person property for your microformat, just reuse hCard), but also about where to get names for properties.
In particular, if you find that your new microformat has a property which means the same thing as an exsiting microformat, you SHOULD (maybe I should make this a MUST) reuse the class name from that existing microformat. This practice also follows the principle of minimal vocabulary, and of re-using the same name to mean the same thing (instead of using two names to mean the same thing).
For Other Standards, Prefer Older to Newer
If there is no microformat name for a property, and we are reusing names based upon research of existing formats, then often there is more than one format with more than one name for the particular concept.
Often times new standards are developed which (most often) needlessly rename names from older standards. Thus to repair such naming drift, all other things being equal (e.g. both standards have been widely interoperably implemented), we prefer the older name over the newer name.
Examples of Following the Naming Principles
We've followed these naming principles from the start, and made changes to microformats in development as a result. For example, xFolk was changed from v0.4 to v1RC. xFolk dropped the new class name "extended" in preference for re-using the existing "description" class name. See Changes since xFolk 0.4 for details.
Naming Patterns Under Consideration as Principles
A few patterns have arisen in the naming of class names for microformats, and while these patterns are not conventions (yet), it may be worth considering them.
So far, all datetime class names start with "dt", and all class names that start with "dt" are ISO8601 datetime properties. E.g.
Note that "dt" is also under consideration for type XOXO.
Undefined: dtstamp - hCalendar
exceptions to dt prefix
However, some proposed/underdevelopment microformats currently have class names for datetime properties without the "dt" prefix:
So far, all uses of a single "h" prefix in a property name apply to (potential) root elements. But not all (potential) root elements start with "h" (which is ok).
Should we enforce the rule that only (potential) root elements may begin with an "h" prefix?
Non-h-prefixed root elements:
Here are things not to do when creating names:
Avoid namespaces or anything resembling namespaces like prefixes (i.e. class names of microformat-key]); read namespaces considered harmful. The problem briefly stated is that namespacing or prefixing encourages silo formats (instead of modular formats, one of the principles) that neither reuse nor are themselves reusable, certainly not in any easy/elegant way. hAtom uses a limited amount of prefixing to exactly reuse a particular semantic from the Atom spec, but even there, uses a generic prefix "entry-" for terms that could then be reused, rather than a specific prefix like "hatom-" which would look awkward in any instance of reuse outside of hAtom. Note: even this limited use of prefixing with "entry-" has been dropped in microformats2 h-entry, for greater re-use of existing more generic terms.
Anglo centric renaming when reusing
Avoid renaming vocabulary when reusing from other specifications. Even if you think you are picking a more understandable English term, you are actually making it more confusing to non-native-English developers, and you are going to waste even diligent native-English developers' time wondering if the two terms (your new "better" term, and the original term) mean exactly the same thing or not. Why even allow for the possibility of confusion? Avoid renaming when reusing.