microformats 2.0

(Difference between revisions)

Jump to: navigation, search
(Summary: all properties are both optional and plural)
(naming conventions for generic parsing: e-* properties are serialized using the HTML spec: Serializing HTML Fragments algorithm)
Line 542: Line 542:
** The 'dt-' prefix is based on "date time" having the initials "dt" and the preponderance of existing date time properties starting with "dt", e.g. dtstart, dtend, dtstamp, dtreviewed.
** The 'dt-' prefix is based on "date time" having the initials "dt" and the preponderance of existing date time properties starting with "dt", e.g. dtstart, dtend, dtstamp, dtreviewed.
* '''"e-*" for properties''' where the entire contained element hierarchy is the value, e.g. "e-content" (formerly "entry-content") for [[hAtom]]. The 'e-' prefix can also be mnemonically remembered as "element tree", "embedded markup", or "encapsulated markup".
* '''"e-*" for properties''' where the entire contained element hierarchy is the value, e.g. "e-content" (formerly "entry-content") for [[hAtom]]. The 'e-' prefix can also be mnemonically remembered as "element tree", "embedded markup", or "encapsulated markup".
 +
** special parsing required: follow the [http://www.whatwg.org/specs/web-apps/current-work/multipage/the-end.html#serializing-html-fragments HTML spec: Serializing HTML Fragments algorithm] to create a serialization.
This provides a simpler transition/education story for existing microformats authors/publishers:  
This provides a simpler transition/education story for existing microformats authors/publishers:  

Revision as of 03:22, 18 December 2011


Welcome to the in-progress development of microformats 2.0.

Contents

Summary

Microformats 2.0 improves ease of use and implementation for both authors (publishers) and developers (parser implementers), with the following simplifications:

  1. which class names are used for microformats (those that start with 'h-' 'p-' 'u-' 'dt-', 'e-')
  2. flat sets of properties for all microformats (hierarchical data uses nested microformats) which are all optional and plural (applications needing a singular semantic may use first instance)
  3. syntax independent vocabularies - the previous two simplifications enable generic parsing (thus automatic JSON/XML/RDF output) and syntax independent development of vocabularies
  4. simpler markup for common use-cases - common microformats use-cases have been simplified even further with generic parsing rules and a few implied generic properties: name, url, photo.

simple microformats 2 examples

Here are a few simple microformats 2.0 examples the demonstrate a most of the changes (Parsed JSON examples are a draft experiment - feedback invited!)

Simple person reference:

<span class="h-card">Frances Berriman</span>

Parsed JSON:

[{ 
  "type": ["h-card"],
  "properties": {
    "name": ["Frances Berriman"] 
  }
}]

Simple hyperlinked person reference

<a class="h-card" href="http://benward.me">Ben Ward</a>

Parsed JSON:

[{ 
  "type": ["h-card"],
  "properties": {
    "name": ["Ben Ward"],
    "url": ["http://benward.me"]
  }
}]

Simple hyperlinked person image

<a class="h-card" href="http://rohit.khare.org/">
 <img alt="Rohit Khare"
      src="https://s3.amazonaws.com/twitter_production/profile_images/53307499/180px-Rohit-sq_bigger.jpg" />
</a>

Parsed JSON:

[{ 
  "type": ["h-card"],
  "properties": {
    "name": ["Rohit Khare"],
    "url": ["http://rohit.khare.org"],
    "photo": ["https://s3.amazonaws.com/twitter_production/profile_images/53307499/180px-Rohit-sq_bigger.jpg"]
  }
}]

More details of how simple cases work in microformats-2-implied-properties.

Notes:

  1. "type" uses the full microformat root class name for consistent identification.
  2. all properties are optional and syntactically plural with parsed values provided in document order; particular microformats (and applications there-of) may apply specific/singular semantics to first value of a property.
  3. properties could be explicitly bound as URIs using the html5-profile attribute proposal, or perhaps a 'vocab' attribute instead (more intuitive than the overloaded 'profile').

v2 vocabularies

Status: draft. Please review and provide feedback in IRC.

h-adr

profile/itemtype: http://microformats.org/profile/h-adr

properties:

For backward compatibility, microformats 2 parsers SHOULD detect the following root class name and property names. A microformats 2 parser may use existing microformats parsers to extract these properties. If an "h-adr" is found, don't look for an "adr" on the same element. compat root class name:

properties: (parsed as p- plain text unless otherwise specified)


h-card

profile/itemtype: http://microformats.org/profile/h-card

properties:

For backward compatibility, microformats 2 parsers SHOULD detect the following root class name and property names. A microformats 2 parser may use existing microformats parsers to extract these properties. If an "h-card" is found, don't look for a "vcard" on the same element. compat root class name:

properties: (parsed as p- plain text unless otherwise specified)

Note: use of 'value' within 'tel' should be automatically handled by the support of the value-class-pattern. And for now, the 'type' subproperty of 'tel' is dropped/ignored. If there is demonstrable documented need for additional tel types (e.g. fax), we can introduce new flat properties as needed (e.g. p-tel-fax).

h-entry

profile/itemtype: http://microformats.org/profile/h-entry

properties:

For backward compatibility, microformats 2 parsers SHOULD detect the following root class name and property names. A microformats 2 parser may use existing microformats parsers to extract these properties. If an "h-entry" is found, don't look for a "hentry" on the same element. compat root class name:

properties: (parsed as p- plain text unless otherwise specified)


h-event

profile/itemtype: http://microformats.org/profile/h-event

properties:

For backward compatibility, microformats 2 parsers SHOULD detect the following root class name and property names. A microformats 2 parser may use existing microformats parsers to extract these properties. If an "h-event" is found, don't look for a "vevent" on the same element. compat root class name:

properties: (parsed as p- plain text unless otherwise specified)


h-geo

profile/itemtype: http://microformats.org/profile/h-geo

properties:

For backward compatibility, microformats 2 parsers SHOULD detect the following root class name and property names. A microformats 2 parser may use existing microformats parsers to extract these properties. If an "h-geo" is found, don't look for an "geo" on the same element. compat root class name:

properties: (parsed as p- plain text unless otherwise specified)


v2 vocab notes

Notes:

v2 vocab to-do

To do:

combining microformats

Since microformats 2 uses simple flat sets of properties for each microformat, multiple microformats are combined to indicate additional structure.

h-event location h-card

Events commonly have venue information with additional structure, like address information. For example:

<div class="h-event">
  <a class="p-name u-url" href="http://indiewebcamp.com/2012">
    IndieWebCamp 2012
  </a>
  from <time class="dt-start">2012-06-30</time> 
  to <time class="dt-end">2012-07-01</time> at 
  <span class="p-location h-card">
    <a class="p-name u-url" href="http://urbanairship.com/">
      Urban Airship
    </a>, 
    <span class="p-street-address">334 NW 11th Ave.</span>, 
    <span class="p-locality">Portland</span>, 
    <abbr class="p-region" title="Oregon">OR</abbr>
  </span>
</div>

The nested h-card used to structure the p-location of the h-event is represented as a structured value for "location" in the JSON, which has an additional key, "value" that represents the plain text version parsed from the p-location.

Parsed JSON:

[{ 
  "type": ["h-event"],
  "properties": {
    "name": ["IndieWebCamp 2012"],
    "url": ["http://indiewebcamp.com/2012"],
    "start": ["2012-06-30"],
    "end": ["2012-07-01"],
    "location": [{
      "value": "Urban Airship, 334 NW 11th Ave., Portland, OR",
      "type": ["h-card"],
      "properties": {
        "name": ["Urban Airship"],
        "url": ["http://urbanairship.com/"],
        "street-address": ["334 NW 11th Ave."],
        "locality": ["Portland"],
        "region": ["Oregon"]
      }
    }]
  }
}]

Notes:

About This Brainstorm

This brainstorm is currently written in narrative / exploratory format, that is, acknowledging the success that microformats have had, why so, and then a walk through of known issues with microformats in general along with iteration of ways to address said issues, ending up with the current microformats 2.0 design as a conclusion.

The proposals build on each other resulting in a solution that addresses the vast majority of general issues. The proposed changes merit a major version number increment, hence microformats 2.0.

This mathematical proof/derivation style is used to explicitly encourage understanding (and double-checking) of the rational steps taken in the development of microformats 2.0. Reasons are documented, sometimes along with alternatives considered (and reasons for rejection of those alternatives).

When the microformats 2.0 brainstorm has evolved sufficiently to demonstrate some degree of stability, usability, and implementability, it will be rewritten in a more declarative specification style, and this narrative/derivation will be archived to a background development page for historical purposes.

Tantek

Background

2004: In early February microformats were introduced as a concept at eTech, and in September hCard and hCalendar were proposed at FOO Camp.

2010:

Addressing Issues

AUTHORS and PUBLISHING

Authors and publishers are perhaps the most important constituency in the microformats community. There are more of them than there are developers, programmers, parsers, etc. and they're the ones that solved the chicken-egg problem by publishing microformats even before tools were available for consuming them.

Therefore we must first address author/publisher general issues with microformats.

can we make the simplest case simpler

Issue: How can we make it easier for authors to publish microformats?

Currently the simplest hCard:

<span class="vcard">
  <span class="fn">
    Chris Messina
  </span>
</span>

requires 2 elements (nested, with perhaps at least one being pre-existing), and 2 class names.

Web authors/designers are used to the simplicity of most HTML tags, e.g. to mark up a heading:

<h1>Chris Messina</h1>

requires just 1 element.

Jeffrey Zeldman pointed out this apparent perceived incremental complexity (2 elements vs 1) during a microformats workshop in 2009 in New York City.

How can we make microformats just as easy?

Proposal: allow root class name only.

This would enable:

<h1 class="vcard">Chris Messina</h1>

requiring only 1 class name for the simplest case.


renaming for usability

Otherwise known as, choosing one form of consistency over another.

Can we do even better?

One of the most common questions asked about hCard is:

Why does hCard use vcard as the root class name?

This slight inconsistency between the name of the format and the name of the root class name consistently causes confusion in a large percentage of newcomers to microformats.

Though in microformats we believe very strongly in the principle of reuse, we have to admit that in this case experience/evidence has shown that this may be a case where we re-used something too far beyond it's original meaning. Thus:

Proposal: use root class name "hcard" instead of "vcard" for future hCards.

This would result in:

<h1 class="hcard">Chris Messina</h1>

making the simple case even simpler:

Just 1 additional class name, named the same as the format you're adding. Think hCard, markup class="hcard".

At a minimum for compatibility we should document that parsers should accept "hcard" as an alternative to "vcard" as the root class name for hCard 1.0, and similarly for hCalendar 1.0: "hcalendar" in addition to "vcalendar", "hevent" in addition to "vevent".

However, for microformats-2 we are going to distinguish root class names further by using an "h-" prefix (e.g. "h-card"). Read on to understand why.

simplifying to only needing one element

It's very important for the simple case to be as simple as possible, to enable the maximum number of people to get started with minimum effort. (The idea of using a single class name for a microformat was proposed by Ryan Cannon in 2006 specifically for hCard, and rediscovered by Tantek in 2010 and subsequently generalized to all microformats.)

From there on, it's ok to require incremental effort for incremental return.

E.g. to add any additional information about a person, add explicit property names.

How does this simple root-only case work?

flat sets of properties

What more can we simplify about microformats?

Numerous individuals have provided the feedback that whenever there is more than one level of hierarchy in a microformat, many (most?) developers get confused - in particular Kavi Goel of Google / Rich Snippets provided this feedback at a microformats dinner. Thus depending on multiple levels of hierarchy is likely resulting in a loss of authorability, perhaps even accuracy as confusion undoubtedly leads to more errors. Thus:

Proposal: simplify all microformats to flat sets of properties.

What this means:

For example for hCard this would mean the following specific changes to keep relevant functionality:

Example: add a middle initial to the previous example Chris Messina's name, and markup each name component:

<h1 class="hcard">
 <span class="fn">
  <span class="given-name">Chris</span>
  <abbr class="additional-name">R.</abbr>
  <span class="family-name">Messina</span>
 </span>
</h1>

Note:

  1. use of an explicit span with "fn" to markup his entire formatted name
  2. use of the abbr element to explicitly indicate the semantic that "R." is merely an abbreviation for his additional-name.

distinguishing properties from other classes

Current microformats properties re-use generic terms like "summary", "photo", "updated" both for ease of use and understanding.

However, through longer term experience, we've seen sites that accidentally drop (or break) their microformats support (e.g. Upcoming.org, Facebook) because web authors sometimes rewrite all their class names, and either are unaware that microformats were in the page, or couldn't easily distinguish microformats property class names from other site-specific class names.

This issue has been reported by a number of web authors.

Thus microformats 2 uses prefixes for property class names, e.g.:

Such prefixing of all microformats class names was first suggested by Scott Isaacs of Microsoft to Tantek on a visit to Microsoft sometime in 2006/2007, but specifically aimed at making microformats easier to parse. At the time the suggestion was rejected since microformats were focused on web authors rather than parsers.

However, since experience has shown that distinguishing property class names is an issue for both web authors and parser developers, this is a key change that microformats 2 is adopting. See the next section for details.

COMMUNITY and TOOLS

The second most important constituency in the microformats community are the developers, programmers, tool-makers.

A non-trivial number of them have been sufficiently frustrated with some general issues with microformats that they've done the significant extra work to support very different and less friendly alternatives (microdata, RDFa). Based on this real-world data (market behavior), it behooves us to address these general issues with microformats for this constituency.

existing microformats parsing requirements

COMMUNITY and TOOLS (that) USE MICROFORMATS

parsing microformats currently requires

  1. a list of root class names of each microformat to be parsed
  2. a list of properties for each specific microformats, along with knowledge of the type of each property in order to parse their data from potentially different portions of the HTML markup
  3. some number of format-specific specific rules (markup/content optimizations)

This has meant that whenever a new microformat is drafted/specificied/adopted, parsers need to updated to handle it correctly, at a minimum to parse them when inside other microformats and avoid errantly implying properties from one to the other (containment, mfo problem).

naming conventions for generic parsing

I think there is a fairly simple solution to #1 and #2 from the above list, and we can make progress towards minimizing #3. In short:

Proposal: a set of naming conventions for microformat root class names and properties that make it obvious when:

In particular - derived from the real world examples of existing proven microformats (rather than any abstraction of what a schema should have)

This provides a simpler transition/education story for existing microformats authors/publishers:

See microformats-2-prefixes for further thoughts and discussions on these and other class prefixes.

Example: taking that simple heading hCard example forward:

<h1 class="h-card">Chris Messina</h1>

As part of microformats 2.0 we would immediately define root class names and property names for all existing microformats and drafts consistent with this naming convention, and require support thereof from all new implementations, as well as strongly encouraging existing implementations to adopt the simplified microformats 2.0 syntax and mechanism. Question: which microformats deserve explicit backward compatibility?

As a community we would continue to use the microformats process both for researching and determining the need for new microformats, and for naming new microformat property names for maximum re-use and interoperability of a shared vocabulary.

If it turns out we need a new property type in the future, we can use one of the remaining single-letter-prefixes to add it to microformats 2.0. This would require updating of parsers of course, but in practice the number of different types of properties has grown very slowly, and we know from other schema/programming languages that there's always some small limited number of scalar/atomic property types that you need, and using those you can create compound types/objects that represent richer / more complicated types of data. See microformats-2-prefixes for documentation of existing single-letter class name prefixes in practice.

ADVANTAGES

This has numerous advantages:

More examples: here is that same heading example with name components:

<h1 class="h-card">
 <span class="p-fn">
  <span class="p-given-name">Chris</span>
  <abbr class="p-additional-name">R.</abbr>
  <span class="p-family-name">Messina</span>
 </span>
</h1>

with a hyperlink to Chris's URL:

<h1 class="h-card">
 <a class="p-fn u-url" href="http://factoryjoe.com/">
  <span class="p-given-name">Chris</span>
  <abbr class="p-additional-name">R.</abbr>
  <span class="p-family-name">Messina</span>
 </a>
</h1>

COMPATIBILITY

microformats 2.0 is backwards compatible in that in permits content authors to markup with both old and new class names for compatibility with old tools.

Here is a simple example:

<h1 class="h-card vcard">
 <span class="fn">Chris Messina</span>
</h1>

a microformats 2.0 parser would see the class name "h-card" and imply the one required property from the contents, while a microformats 1.0 parser would find the class name "vcard" and then look for the class name "fn". no data duplication is required. this is a very important continuing application of the DRY principle.

And the above hyperlinked example with both sets of class names:

<h1 class="h-card vcard">
 <a class="p-fn u-url n fn url" href="http://factoryjoe.com/">
  <span class="p-given-name given-name">Chris</span>
  <abbr class="p-additional-name additional-name">R.</abbr>
  <span class="p-family-name family-name">Messina</span>
 </a>
</h1>


VENDOR EXTENSIONS

(this section was only discussed verbally and not written up during discussions - capturing here as it is topical)

Proprietary extensions to formats have typically been shortlived experimental failures with one big recent exception.

Proprietary or experimental CSS3 property implementations have been very successful.

There has been much use of border radius properties and animations/transitions which use CSS properties with vendor-specific prefixes like:

etc.

Note that these are merely string prefixes, not bound to any URL, and thus not namespaces in any practical sense of the word. This is quite an important distinction, as avoiding the need to bind to a URL has made them easier to support and use.

This use of vendor specific CSS properties has in recent years allowed the larger web design/development/implementor communities to experiment and iterate on new CSS features while the features were being developed and standardized.

The benefits have been two-fold:

Implementers have used/introduced "x-" prefixes for IETF MIME/content-types for experimental content-types, MIME parameter extensions, and HTTP header extensions, per RFC 2045 Section 6.3, RFC 3798 section 3.3, and Wikipedia: HTTP header fields - non-standard headers (could use RFC reference instead) respectively, like:

Some standard types started as experimental "x-" types, thus demonstrating this experiment first, standardize later approach has worked for at least some cases:

There have been times when specific sites have wanted to extend microformats beyond what the set of properties in the microformat, and currently lack any experimental way to do so - to try and see if a feature (or even a whole format) is interesting in the real world before bothering to pursue researching and walking it through the microformats process. Thus:

Proposal:

Background - this proposal is a composition of the following (at least somewhat) successful vendor extension syntaxes

USERS

Need more tools and interfaces that:

discussed some existing like: H2VX converts hCard to vCard, hCalendar to iCalendar

how would we re-implement Live Clipboard today, making it easier for publishers and developers?

SEE ALSO

microformats 2.0 was last modified: Wednesday, December 31st, 1969

Views