microformats2

(Difference between revisions)

Jump to: navigation, search
(note imply generic 'name' property rather than per-vocabulary specific property, notes on *why* special property types get their own prefixes, and reasons to postpone/reject others)
(Added Governance, Collaboration, Innovation, and Vocabularies vs. Syntax sections)
Line 339: Line 339:
-- [[User:Tantek|Tantek]] 02:15, 11 April 2011 (UTC)
-- [[User:Tantek|Tantek]] 02:15, 11 April 2011 (UTC)
 +
 +
=== VOCABULARIES vs. SYNTAX ===
 +
 +
Problem: Microformats unnecessarily combines vocabularies and syntax. Split the two. Microformats vocabularies should be able to be used in all structured data languages. Microformats syntax should be available to those who want to use it.
 +
 +
RDFa 1.1 has added Microformats-like features over the past few years because they wanted RDFa 1.1 markup to be just as easy as Microformats markup. This example is used on the Microformats 2 page:
 +
 +
<source lang=html4strict>
 +
<h1 class="h-card">
 +
<span class="p-fn">
 +
  <span class="p-given-name">Chris</span>
 +
  <abbr class="p-additional-name">R.</abbr>
 +
  <span class="p-family-name">Messina</span>
 +
</span>
 +
</h1>
 +
</source>
 +
 +
The markup above can be easily expressed in RDFa 1.1, using RDFa Profiles like so:
 +
 +
<source lang=html4strict>
 +
<h1 typeof="hcard">
 +
<span property="fn">
 +
  <span property="given-name">Chris</span>
 +
  <abbr property="additional-name">R.</abbr>
 +
  <span property="family-name">Messina</span>
 +
</span>
 +
</h1>
 +
</source>
 +
 +
This is useful to the Microformats 2 work because every RDFa 1.1 compliant parser could easily become a compliant Microformats 2 parser. There are many benefits for sharing parser infrastructure.
 +
 +
Proposal: Map all Microformats to a schema that work with both RDFa and Microdata.
 +
 +
=== GOVERNANCE ===
 +
 +
Problem: Governance of the Microformats community is partially what derailed the first set of proposals and work - is there a better way to involve the community in governance tasks such that a meritocracy can prevail?
 +
 +
'''The Cabal'''. One of the strongest criticisms by the community has always been the status of the self-appointed leaders. They do a good job most of the time, but having a mechanism where the community elects the leaders and administrators would get us closer to a meritocracy. Not allowing the community to govern itself shows that you don’t trust the membership of the community. If you don’t trust us, how can we trust you? If there is a “you” and a “them”, then it becomes easy to have a “you versus them” situation. The Microformats community could learn a great deal from the Debian community in this respect.
 +
 +
'''The Process'''. There have been previous complaints that it was not very clear what you need to meet each hurdle in the Microformats process. This seems to have been clarified with the new Microformats 2 work. There is still concern that too much is left in the hands of the “leaders”. There was a great deal of “moving the goalposts” during the development of the hAudio work. The process kept changing. If the process keeps changing, it can mean that all of your hard work may not end up making it to the “official” Microformats standard stage. Understandably, folks will be suspect of the process if the community has no power over who gets to change the process and when.
 +
 +
Proposal: Kickstart the Microformats 2 work with the original founders of Microformats. Allow 1/2 of the governance seats to be replaced after the first six months. Allow 2 year terms for seats. Only people that have contributed at least 50 e-mails of discussion or have worked on Microformats specs have the ability to vote.
 +
 +
=== INNOVATION ===
 +
 +
Problem: Microformats are good at paving the cowpaths - but what about the development of innovative applications? Could there be an incubator for groundbreaking work in the Microformats community. Innovation done right?
 +
 +
'''Open Innovation'''. How does one innovate in the Microformats community? That is, how do we have an open discussion about the [http://purl.org/commerce Commerce], [http://purl.org/signature Signature] and [http://purl.org/payswarm PaySwarm] Web vocabularies in the Microformats community? We’re trying to solve a real-world problem – Universal Payment on the Web. We need to have an open discussion about the Web vocabularies used to accomplish this goal. How can we have this discussion in the Microformats community?
 +
 +
Proposal: Allow Microformats to be developed for open source software and systems without pre-existing markup. Systems could be subjected to Microformats guiding principles of simplicity and putting the target audience first.
 +
 +
=== COLLABORATION ===
 +
 +
Problem: The Microformats community and RDFa community are needlessly fragmented when they could both be working together.
 +
 +
'''Collaboration'''. How can the RDFa community, Microdata folks and the Microformats community work together? Manu has been trying to make this happen for several years now, each attempt met with varied levels of failure. Each community's continued track record of not reaching out and working with one another on a regular basis is damaging structured data adoption on the Web. Each community feels as if they are blame-less for the current state of affairs. “If only they’d listen to us, we wouldn’t be in this mess!”. Schema.org is just one signal that all of us need to come together and work on a unified way forward.
=== USERS ===
=== USERS ===

Revision as of 19:00, 18 June 2011

2004: In early February microformats were introduced as a concept at eTech, and in September hCard and hCalendar were proposed at FOO Camp.

2010:


Contents

AUTHORS and PUBLISHING

How can we make it easier for authors to publish microformats?

Currently the simplest hCard:

<span class="vcard">
  <span class="fn">
    Chris Messina
  </span>
</span>

requires 2 elements (nested, with perhaps at least one being pre-existing), and 2 class names

Web authors/designers are used to the simplicity of most HTML tags, e.g. to mark up a heading:

<h1>Chris Messina</h1>

requires just 1 element.

How can we make microformats just as easy?

Proposal: allow root class name only.

This would enable:

<h1 class="vcard">Chris Messina</h1>

requiring only 1 class name for the simplest case.

Can we do even better?

One of the most common questions asked about hCard is:

Why does hCard use vcard as the root class name?

This slight inconsistency between the name of the format and the name of the root class name consistently causes confusion in a large percentage of newcomers to microformats.

Though in microformats we believe very strongly in the principle of reuse, we have to admit that in this case experience/evidence has shown that this may be a case where we re-used something too far beyond it's original meaning. Thus:

Proposal: use root class name "hcard" instead of "vcard" for future hCards.

This would result in:

<h1 class="hcard">Chris Messina</h1>

making the simple case even simpler:

Just 1 additional class name, named the same as the format you're adding. Think hCard, markup class="hcard".

It's very important for the simple case to be as simple as possible, to enable the maximum number of people to get started with minimum effort.

From there on, it's ok to require incremental effort for incremental return.

E.g. to add any additional information about a person, add explicit property names.

How does this simple root-only case work?

Additional simplifications

What more can we simplify about microformats?

Numerous individuals have provided the feedback that whenever there is more than one level of hierarchy in a microformat, many (most?) developers get confused - in particular Kavi Goel of Google / Rich Snippets provided this feedback at a microformats dinner. Thus depending on multiple levels of hierarchy is likely resulting in a loss of authorability, perhaps even accuracy as confusion undoubtedly leads to more errors. Thus:

Proposal: simplify all microformats to flat sets of properties.

What this means:

For example for hCard this would mean the following specific changes to keep relevant functionality:

Example: add a middle initial to the previous example Chris Messina's name, and markup each name component:

<h1 class="hcard">
 <span class="fn">
  <span class="given-name">Chris</span>
  <abbr class="additional-name">R.</abbr>
  <span class="family-name">Messina</span>
 </span>
</h1>

Note:

  1. use of an explicit span with "fn" to markup his entire formatted name
  2. use of the abbr element to explicitly indicate the semantic that "R." is merely an abbreviation for his additional-name.

COMMUNITY and TOOLS

(that) USE MICROFORMATS

parsing microformats currently requires

  1. a list of root class names of each microformat to be parsed
  2. a list of properties for each specific microformats, along with knowledge of the type of each property in order to parse their data from potentially different portions of the HTML markup
  3. some number of format-specific specific rules (markup/content optimizations)

This has meant that whenever a new microformat is drafted/specificied/adopted, parsers need to updated to handle it correctly, at a minimum to parse them when inside other microformats and avoid errantly implying properties from one to the other (containment, mfo problem).

I think there is a fairly simple solution to #1 and #2 from the above list, and we can make progress towards minimizing #3. In short:

Proposal: a set of naming conventions for microformat root class names and properties that make it obvious when:

In particular - derived from the real world examples of existing proven microformats (rather than any abstraction of what a schema should have)

possibly also:

and:

Example: taking that simple heading hCard example forward:

<h1 class="h-card">Chris Messina</h1>

As part of microformats 2.0 we would immediately define root class names and property names for all existing microformats and drafts consistent with this naming convention, and require support thereof from all new implementations, as well as strongly encouraging existing implementations to adopt the simplified microformats 2.0 syntax and mechanism.

As a community we would continue to use the microformats process both for researching and determining the need for new microformats, and for naming new microformat property names for maximum re-use and interoperability of a shared vocabulary.

If it turns out we need a new property type in the future, we can use one of the remaining single-letter-prefixes to add it to microformats 2.0. This would require updating of parsers of course, but in practice the number of different types of properties has grown very slowly, and we know from other schema/programming languages that there's always some small limited number of scalar/atomic property types that you need, and using those you can create compound types/objects that represent richer / more complicated types of data.

ADVANTAGES

This has numerous advantages:

More examples: here is that same heading example with name components:

<h1 class="h-card">
 <span class="p-fn">
  <span class="p-given-name">Chris</span>
  <abbr class="p-additional-name">R.</abbr>
  <span class="p-family-name">Messina</span>
 </span>
</h1>

with a hyperlink to Chris's URL:

<h1 class="h-card">
 <a class="p-fn u-url" href="http://factoryjoe.com/">
  <span class="p-given-name">Chris</span>
  <abbr class="p-additional-name">R.</abbr>
  <span class="p-family-name">Messina</span>
 </a>
</h1>


COMPATIBILITY

microformats 2.0 is backwards compatible in that in permits content authors to markup with both old and new class names for compatibility with old tools.

Here is a simple example:

<h1 class="h-card vcard">
 <span class="fn">Chris Messina</span>
</h1>

a microformats 2.0 parser would see the class name "h-card" and imply the one required property from the contents, while a microformats 1.0 parser would find the class name "vcard" and then look for the class name "fn". no data duplication is required. this is a very important continuing application of the DRY principle.

And the above hyperlinked example with both sets of class names:

<h1 class="h-card vcard">
 <a class="p-fn u-url n fn url" href="http://factoryjoe.com/">
  <span class="p-given-name given-name">Chris</span>
  <abbr class="p-additional-name additional-name">R.</abbr>
  <span class="p-family-name family-name">Messina</span>
 </a>
</h1>


VENDOR EXTENSIONS

(this section was only discussed verbally and not written up during discussions - capturing here as it is topical)

Proprietary extensions to formats have typically been shortlived experimental failures with one big recent exception.

Proprietary or experimental CSS3 property implementations have been very successful.

There has been much use of border radius properties and animations/transitions which use CSS properties with vendor-specific prefixes like:

etc.

Note that these are merely string prefixes, not bound to any URL, and thus not namespaces in any practical sense of the word. This is quite an important distinction, as avoiding the need to bind to a URL has made them easier to support and use.

This use of vendor specific CSS properties has in recent years allowed the larger web design/development/implementor communities to experiment and iterate on new CSS features while the features were being developed and standardized.

The benefits have been two-fold:

Implementers have used/introduced "x-" prefixes for IETF MIME/content-types for experimental content-types, MIME parameter extensions, and HTTP header extensions, per RFC 2045 Section 6.3, RFC 3798 section 3.3, and Wikipedia: HTTP header fields - non-standard headers (could use RFC reference instead) respectively, like:

Some standard types started as experimental "x-" types, thus demonstrating this experiment first, standardize later approach has worked for at least some cases:

There have been times when specific sites have wanted to extend microformats beyond what the set of properties in the microformat, and currently lack any experimental way to do so - to try and see if a feature (or even a whole format) is interesting in the real world before bothering to pursue researching and walking it through the microformats process. Thus:

Proposal:

Background - this proposal is a composition of the following (at least somewhat) successful vendor extension syntaxes

FURTHER THOUGHTS REGARDING HUNGARIAN PREFIXING

Microformats 2.0 proposes using an explicit [a-z]- prefix on properties, to differentiate them from other uses of the class attribute, and identify them as microformat properties, such that they can be parsed generically.

(Note: the theoretical assertion "they need to parse all objects from a page" is not actually backed by *any* existing use of microformats/microdata/RDFa parsing - *none* of those parse "all objects from a page" if you consider every markup element an "object" - rather, one of the strength of microformats (mimicked by the others) is that the publisher is able to markup *just* the data to be extracted, rather than perhaps purely "presentational" content, ads, UI widgets etc. -- Tantek 02:15, 11 April 2011 (UTC) )

The µf2 proposal goes further, though, into a small vocabulary of Hungarian prefixes of properties based on data type. This increases the level of understanding required to read microformats, and reduces the benefit of all microformat properties having a consistent identifying prefix.

(Debatable assertion:"increases the level of understanding required to read microformats" - how? In microformats 2.0, authors/developers know that any single-letter-and-hyphen prefixed class name is for microformats 2.0, in contrast to today - developers have consistently given feedback that's hard to tell which generic class names (other than h* names) are microformat related and which are not. As for specific prefixes, "h-*" is special and follows the pattern of existing microformats. p = generic (p)roperty, and the other prefixes have trivial mnemonics as well, d for (d)atetimes etc. (so far, hopefully we can keep that up). -- Tantek 02:15, 11 April 2011 (UTC) )

Hungarian notation itself is controversial amongst programmers. Plenty find it uglifies their code, can be a cause of confusion (especially when very-short prefixes are used, or esoteric types, or where the declared set of types differs from the available types in other programming languages.) Others support its benefits to type identification.

(Programmers are not the priority here, rather, designers/authors/publishers are. We design microformats for them first as they're the common use case, and we should avoid making statements that seem to imply any priority for the aesthetic preferences of programmers. -- Tantek 02:15, 11 April 2011 (UTC))

Critically, however, there is no clear indication that either of the above use cases requires types to be strongly identified.

  • For identifying µf in pages, a differentiator is required from regular classnames. There is no evidence of further requirement to differentiate between properties beyond their name (and existing criticisms of Hungarian notation suggest it can harm understandability.)
    • There is such evidence, and perhaps thus this would be a good FAQ topic. The derivation is quite simple - it comes directly from minimally affecting existing markup, and maximally using existing semantic information. Example of special purpose parsing, URL-like properties use the value of the 'href' (or equivalent) attribute because that's where that data already is in pages. Similarly with dates and datetimes - special parsing rules for that data type have permitted us to design the value-class-pattern to take advantage of specially parsing date and time separation. By re-using data *where publishers already put it, including attributes vs inline* we minimize the risk of data drift. -- Tantek 02:15, 11 April 2011 (UTC)
    • Additionally, this special type-specific parsing of microformats properties conveys microformats advantages of markup brevity that other syntaxes lack. E.g. you can convey *multiple* properties and values from a single existing element, e.g. the *very* common real-world pattern
      <a href="http://example.com/user">User Name</a>
      is minimally marked up as
      <span class="h-card"><a class="p-name u-url" href="http://example.com/user">User Name</a></span>
  • For generic parsing, there is no requirement that datatypes be established at extraction time. Data types will instead be applied by the developers of apps and widgets that build on the generic parsers.
    • There are requirements based on experience with actual markup. In order to support the patterns of where content publishers put the data we want to extract, we have determined (based on those publishing patterns) a few different ways (types) of parsing this data. This is all captured in the hcard-parsing property-specific parsing rules each of which were added one at a time as Brian Suda and myself encountered real world sites wanting to use hCard but not wanting to have to rewrite their markup (adding one span and some class names was about the limit, moving tags/attributes around was a showstopper in many/most cases), and each of the microformats 2.0 "types" are directly derived from such special purpose data/type parsing across *multiple* microformats. -- Tantek 02:15, 11 April 2011 (UTC)
  • A counter argument may be that special properties in microformats—such as URLs, or images—need to be identified because in microformats it is common to parse an attribute (href, or src) rather than inner text of an element for these properties. However, in the context of extracting and then interpreting HTML in other contexts this is insufficient: For example, though an image only exists as a single property in vcard, in HTML it is both a URL to a resource and and text string (alt) representing an accessible fallback. A ‘generic extracter’ of microformats from a page must capture all of this information from HTML, so that the interpreting application can choose which data type is most relevant to its context. Likewise, an application interpreting a URL may also consider using the original inner text as an inferred label. Both pieces of data are useful, and a generic parser should not discard elemental semantics at the extraction level.
    • It's not just "*common* to parse an attribute rather than inner text of an element for these properties" - it is the vast overwhelming majority - if not all - such cases!
    • One misconception: "image only exists as a single property". No, there is both 'photo' and 'logo'. The 'url' and 'sound' properties are also of type 'url'. For all of these, when parsing an "object" element, you must use the 'data' attribute first for example. hCalendar has "attachment" as well. Etc.
    • Theoretical assertion: "A ‘generic extracter’ of microformats from a page must capture all of this information from HTML, so that the interpreting application can choose which data type is most relevant to its context." Why? There is no existing nor demonstrated use case for this requirement, even across other formats. While I agree it "might be nice" to develop a new "structured image" type - that's brand new work (deserving of research per the process etc.), and not a good source of reasoning to reject existing working patterns. I reject blocking microformats 2.0 on an as-yet-to-be-researched-enhancement. This is certainly a case where "better" is an enemy of the good.
    • Theoretical assertion: "a generic parser should not discard elemental semantics at the extraction level" - already does for other syntaxes like both microdata and RDFa - so clearly this is not a reasonable "should not" assertion (and thus unnecessary) for development of a minimally competitive syntax. RDFa kind of cheats by overloading the 'rel' attribute in attempt to solve the name+url case as mentioned above, but that's only two types - and existing real world use of microformats has demonstrated utility of a few more. -- Tantek 02:15, 11 April 2011 (UTC)

Given this, hungarian prefixes are of no benefit to parsers (and may in fact harm applications down the chain if parsing is prematurely strict.) It would be sufficient then not to concern embedding data types in property names, and instead settle on one single property prefix to differentiate all properties consistently. This would reduce the prefixes to just 3:

--BenWard 01:16, 11 April 2011 (UTC)

The primary benefit of type-specific parsing is *not* for parsers, but rather, publishers (who we still hold in higher priority than parsers).

I will also note that *each* of the type-specific parsing methods in hcard-parsing was added both conservatively, reluctantly, and only when it became clear that such type-specific publishing patterns existing across *multiple* sites that would otherwise be unable to change their markup to work with microformats (Yes, I'm wishing now that I better documented exactly *which* sites, precisely *when*, but like many startups, early on we didn't exactly know how much to document vs get things done - frankly I think we documented far more than any other comparable such efforts, e.g. we managed to at least capture/grow both an explicit process and principles in *far* greater detail than anything remotely comparable either before microformats or since!). The type-specific parsing features are certainly not overdesigned, on the contrary they've *slowly* evolved, adapting to real world data on the web.

While per the simplicity principle, I would actually *strongly* prefer to only have the three prefixes given above, or actually just *two* (I started with just two for the design of microformats 2.0 actually, just "h-*" and "p-*"), doing so would be a step *backwards* in terms of the adaptability of microformats to existing markup, and that's IMHO an unacceptable barrier, and a sufficiently high barrier to hurt the adoption/applicability of microformats 2.0.

(Aside: In addition, note that you still need h-x-* for experimental objects, and thus it's *simpler* to simply have *both* h-x-* and p-x-* rather than add x-*. Alternatively x-h-* and x-p-* is no better, in some ways worse, in that object vs. property is a more important distinction for parsers than established vs experimental, especially if/when an experimental property (or object) may be adopted. Also, mild precdent: PNG started with image/x-png, not x-image/png.).

To put it in a positive way, type-specific parsing conveys microformats a publisher-markup-density (and re-use) advantage which neither microdata nor RDFa have, and it would behoove us to *keep* this significant real-world advantage as we evolve microformats.

-- Tantek 02:15, 11 April 2011 (UTC)


VOCABULARIES vs. SYNTAX

Problem: Microformats unnecessarily combines vocabularies and syntax. Split the two. Microformats vocabularies should be able to be used in all structured data languages. Microformats syntax should be available to those who want to use it.

RDFa 1.1 has added Microformats-like features over the past few years because they wanted RDFa 1.1 markup to be just as easy as Microformats markup. This example is used on the Microformats 2 page:

<h1 class="h-card">
 <span class="p-fn">
  <span class="p-given-name">Chris</span>
  <abbr class="p-additional-name">R.</abbr>
  <span class="p-family-name">Messina</span>
 </span>
</h1>

The markup above can be easily expressed in RDFa 1.1, using RDFa Profiles like so:

<h1 typeof="hcard">
 <span property="fn">
  <span property="given-name">Chris</span>
  <abbr property="additional-name">R.</abbr>
  <span property="family-name">Messina</span>
 </span>
</h1>

This is useful to the Microformats 2 work because every RDFa 1.1 compliant parser could easily become a compliant Microformats 2 parser. There are many benefits for sharing parser infrastructure.

Proposal: Map all Microformats to a schema that work with both RDFa and Microdata.

GOVERNANCE

Problem: Governance of the Microformats community is partially what derailed the first set of proposals and work - is there a better way to involve the community in governance tasks such that a meritocracy can prevail?

The Cabal. One of the strongest criticisms by the community has always been the status of the self-appointed leaders. They do a good job most of the time, but having a mechanism where the community elects the leaders and administrators would get us closer to a meritocracy. Not allowing the community to govern itself shows that you don’t trust the membership of the community. If you don’t trust us, how can we trust you? If there is a “you” and a “them”, then it becomes easy to have a “you versus them” situation. The Microformats community could learn a great deal from the Debian community in this respect.

The Process. There have been previous complaints that it was not very clear what you need to meet each hurdle in the Microformats process. This seems to have been clarified with the new Microformats 2 work. There is still concern that too much is left in the hands of the “leaders”. There was a great deal of “moving the goalposts” during the development of the hAudio work. The process kept changing. If the process keeps changing, it can mean that all of your hard work may not end up making it to the “official” Microformats standard stage. Understandably, folks will be suspect of the process if the community has no power over who gets to change the process and when.

Proposal: Kickstart the Microformats 2 work with the original founders of Microformats. Allow 1/2 of the governance seats to be replaced after the first six months. Allow 2 year terms for seats. Only people that have contributed at least 50 e-mails of discussion or have worked on Microformats specs have the ability to vote.

INNOVATION

Problem: Microformats are good at paving the cowpaths - but what about the development of innovative applications? Could there be an incubator for groundbreaking work in the Microformats community. Innovation done right?

Open Innovation. How does one innovate in the Microformats community? That is, how do we have an open discussion about the Commerce, Signature and PaySwarm Web vocabularies in the Microformats community? We’re trying to solve a real-world problem – Universal Payment on the Web. We need to have an open discussion about the Web vocabularies used to accomplish this goal. How can we have this discussion in the Microformats community?

Proposal: Allow Microformats to be developed for open source software and systems without pre-existing markup. Systems could be subjected to Microformats guiding principles of simplicity and putting the target audience first.

COLLABORATION

Problem: The Microformats community and RDFa community are needlessly fragmented when they could both be working together.

Collaboration. How can the RDFa community, Microdata folks and the Microformats community work together? Manu has been trying to make this happen for several years now, each attempt met with varied levels of failure. Each community's continued track record of not reaching out and working with one another on a regular basis is damaging structured data adoption on the Web. Each community feels as if they are blame-less for the current state of affairs. “If only they’d listen to us, we wouldn’t be in this mess!”. Schema.org is just one signal that all of us need to come together and work on a unified way forward.

USERS

Need more tools and interfaces that:

discussed some existing like: H2VX converts hCard to vCard, hCalendar to iCalendar

how would we re-implement Live Clipboard today, making it easier for publishers and developers?

SEE ALSO

microformats2 was last modified: Wednesday, December 31st, 1969

Views