microformats2-prefixes: Difference between revisions
(follow-up on s- comments, some agreement, some need for further documentation, note about label / adr updates) |
Kevin Marks (talk | contribs) |
||
(16 intermediate revisions by 7 users not shown) | |||
Line 1: | Line 1: | ||
{{DISPLAYTITLE:microformats2 prefix conventions}} | |||
[[ | [[microformats2]] uses a small number of prefixes to distinguish microformats2 class names from other class names. | ||
== | == microformats2 prefixes == | ||
=== naming conventions for generic parsing === | === naming conventions for generic parsing === | ||
The naming conventions for microformats class names make it obvious when:''' | |||
* a class name represents a microformat '''root class name''' | |||
* a class name represents a microformat '''property name''' | |||
* a class name represents a microformat root class name | * a class name represents a microformat '''property that needs special parsing''' | ||
* a class name represents a microformat property name | |||
* a class name represents a microformat property that needs special parsing | |||
In particular | In particular: | ||
* '''"h-*" for root class names''', e.g. "h-card", "h-event", "h-entry" | * '''"h-*" for root class names''', e.g. "h-card", "h-event", "h-entry" | ||
* '''"p-*" for plain (text) properties''', e.g. "p-name", "p-summary" | |||
* '''"p-*" for | ** generic plain text parsing, element text in general, certain HTML elements use special attributes first, e.g. img/alt, abbr/title. | ||
** | |||
* '''"u-*" for URL properties''', e.g. "u-url", "u-photo", "u-logo" | * '''"u-*" for URL properties''', e.g. "u-url", "u-photo", "u-logo" | ||
** special parsing required: prefer a/href, img/src, object/data etc. attributes to element contents. | ** special parsing required: prefer a/href, img/src, object/data etc. attributes to element contents. | ||
* '''"dt-*" for datetime properties''', e.g. "dt-start", "dt-end", "dt-bday" | * '''"dt-*" for datetime properties''', e.g. "dt-start", "dt-end", "dt-bday" | ||
** special parsing required: [[value-class-pattern]] | ** special parsing required: [[value-class-pattern]] and separate date time value parsing for readability | ||
* '''"e-*" for element tree properties''' where the entire contained element hierarchy is the value, e.g. "e-content" (formerly "entry-content") for [[hAtom]]. The 'e-' prefix can also be mnemonically remembered as "element tree", "embedded markup", or "encapsulated markup". | |||
* '''"e-*" for properties''' where the entire contained element hierarchy is the value, e.g. "e-content" (formerly "entry-content") for [[hAtom]]. The 'e-' prefix can also be mnemonically remembered as "element tree", "embedded markup", or "encapsulated markup". | |||
== microformats2 examples == | |||
Example: simple heading h-card example: | |||
Example: | |||
<source lang=html4strict> | <source lang=html4strict> | ||
<h1 class="h-card">Chris Messina</h1> | <h1 class="h-card">Chris Messina</h1> | ||
</source> | </source> | ||
More examples: here is that same heading example with name components: | More examples: here is that same heading example with name components: | ||
Line 131: | Line 31: | ||
<source lang=html4strict> | <source lang=html4strict> | ||
<h1 class="h-card"> | <h1 class="h-card"> | ||
<span class="p-given-name">Chris</span> | |||
<abbr class="p-additional-name">R.</abbr> | |||
<span class="p-family-name">Messina</span> | |||
</h1> | </h1> | ||
</source> | </source> | ||
Line 143: | Line 41: | ||
<source lang=html4strict> | <source lang=html4strict> | ||
<h1 class="h-card"> | <h1 class="h-card"> | ||
<a class=" | <a class="u-url" href="http://factoryjoe.com/"> | ||
<span class="p-given-name">Chris</span> | <span class="p-given-name">Chris</span> | ||
<abbr class="p-additional-name">R.</abbr> | <abbr class="p-additional-name">R.</abbr> | ||
Line 151: | Line 49: | ||
</source> | </source> | ||
=== | === backwards compatibility === | ||
microformats2 provides backwards compatibility by enabling content authors to markup with both old and new class names for compatibility with old tools. | |||
Here is a simple example: | Here is a simple backcompat example: | ||
<source lang=html4strict> | <source lang=html4strict> | ||
Line 163: | Line 61: | ||
</source> | </source> | ||
A microformats2 parser would see the class name "h-card" and imply the one required property from the contents, while a microformats 1.0 parser would find the class name "vcard" and then look for the class name "fn". no data duplication is required. this is a very important continuing application of the <abbr title="don't repeat yourself">DRY</abbr> [[principle]]. | |||
And the above hyperlinked example with both sets of class names: | And the above hyperlinked example with both sets of class names: | ||
Line 169: | Line 67: | ||
<source lang=html4strict> | <source lang=html4strict> | ||
<h1 class="h-card vcard"> | <h1 class="h-card vcard"> | ||
<a class=" | <a class="u-url n fn url" href="http://factoryjoe.com/"> | ||
<span class="p-given-name given-name">Chris</span> | <span class="p-given-name given-name">Chris</span> | ||
<abbr class="p-additional-name additional-name">R.</abbr> | <abbr class="p-additional-name additional-name">R.</abbr> | ||
Line 176: | Line 74: | ||
</h1> | </h1> | ||
</source> | </source> | ||
== vendor extensions == | == vendor extensions == | ||
Proprietary extensions to formats have typically been shortlived experimental failures with one big recent exception. | Proprietary extensions to formats have typically been shortlived experimental failures with one big recent exception. | ||
Line 192: | Line 87: | ||
etc. | etc. | ||
Note that these are merely string '''prefixes''', not bound to any URL, and thus not namespaces in any practical sense of the word. | Note that these are merely string '''prefixes''', not bound to any URL, and thus not namespaces in any practical sense of the word. This is quite an important distinction, as avoiding the need to bind to a URL has made them easier to support and use. | ||
This use of vendor specific CSS properties has in recent years allowed the larger web design/development/ | This use of vendor specific CSS properties has in recent years allowed the larger web design/development/implementer communities to experiment and iterate on new CSS features while the features were being developed and standardized. | ||
The benefits have been two-fold: | The benefits have been two-fold: | ||
Line 212: | Line 107: | ||
There have been times when specific sites have wanted to extend microformats beyond what the set of properties in the microformat, and currently lack any '''experimental''' way to do so - to try and see if a feature (or even a whole format) is interesting in the real world before bothering to pursue researching and walking it through the microformats process. Thus: | There have been times when specific sites have wanted to extend microformats beyond what the set of properties in the microformat, and currently lack any '''experimental''' way to do so - to try and see if a feature (or even a whole format) is interesting in the real world before bothering to pursue researching and walking it through the microformats process. Thus: | ||
* '*-x-' + '-' + meaningful name for root and property class names | * '*-x-' + '-' + meaningful name for root and property class names | ||
** where "*" indicates the single-character-prefix as defined above | ** where "*" indicates the single-character-prefix as defined above | ||
Line 228: | Line 122: | ||
* HTTP header extensions (e.g. x-pingback) | * HTTP header extensions (e.g. x-pingback) | ||
* note also [http://www.mnot.net/blog/2009/02/18/x- some critical thoughts from mnot] | * note also [http://www.mnot.net/blog/2009/02/18/x- some critical thoughts from mnot] | ||
== TO DO == | |||
* move resolved '''issues''' to a separate page. | |||
* clean-up and move background research to a microformats2-background or history page | |||
== issues == | == issues == | ||
Line 283: | Line 181: | ||
-- [[User:Tantek|Tantek]] 02:15, 11 April 2011 (UTC) | -- [[User:Tantek|Tantek]] 02:15, 11 April 2011 (UTC) | ||
== motivating causes == | |||
As described on the [[microformats 2]] page. | |||
=== distinguishing properties from other classes === | |||
Current microformats properties re-use generic terms like "summary", "photo", "updated" both for ease of use and understanding. | |||
However, through longer term experience, we've seen sites that accidentally drop (or break) their microformats support (e.g. Upcoming.org, Facebook) because web authors sometimes rewrite all their class names, and either are unaware that microformats were in the page, or couldn't easily distinguish microformats property class names from other site-specific class names. | |||
This issue has been reported by a number of web authors: | |||
* [http://html5doctor.com/microformats/#comment-10241 Wim's comment on HTML5Doctor] "Authors use classes like 'url' or 'region' all the time ... All sorts of markup might look like a microformat." | |||
* ... | |||
There has also been an anecdotal report of a design firm who was not (yet) familiar with microformats seeing the "extra" classes that "that don't seem to be used" (without corresponding CSS rules) and asking if they "can remove them". By making microformats class names different from generic words, authors unfamiliar with microformats may at least notice such distinction and infer special functionality accordingly. | |||
Thus microformats 2 uses ''prefixes'' for property class names, e.g.: | |||
* '''p-summary''' instead of ''summary'' | |||
* '''u-photo''' instead of ''photo'' | |||
* '''dt-updated''' instead of ''updated'' | |||
Such prefixing of all microformats class names was first suggested by Scott Isaacs of Microsoft to Tantek on a visit to Microsoft sometime in 2006/2007, but specifically aimed at making microformats easier to parse. At the time the suggestion was rejected since microformats were focused on web authors rather than parsers. | |||
However, since experience has shown that distinguishing property class names is an issue for '''both web authors and parser developers''', this is a key change that microformats 2 is adopting. See the next section for details. | |||
=== existing microformats parsing requirements === | |||
A non-trivial number of parser and tools developers have been sufficiently frustrated with some general issues with microformats that they've done the significant extra work to support very different and less friendly alternatives (microdata, RDFa). Based on this real-world data (market behavior), it behooves us to address these general issues with microformats for this constituency. | |||
COMMUNITY and TOOLS (that) USE MICROFORMATS | |||
* parser / parsing | |||
* structured | |||
* getting the data out | |||
* json - 1:1 mapping | |||
[[parsing]] microformats currently requires | |||
# a list of root class names of each microformat to be parsed | |||
# a list of properties for each specific microformats, along with knowledge of the type of each property in order to parse their data from potentially different portions of the HTML markup | |||
# some number of format-specific specific rules (markup/content optimizations) | |||
This has meant that whenever a new microformat is drafted/specificied/adopted, parsers need to updated to handle it correctly, at a minimum to parse them when inside other microformats and avoid errantly implying properties from one to the other (containment, [[mfo]] problem). | |||
=== naming conventions for generic parsing === | |||
There is a fairly simple solution to #1 and #2 from the above list, and we can make progress towards minimizing #3. In short: | |||
'''Proposal: a set of naming conventions for microformat root class names and properties that make it obvious when:''' | |||
* a class name represents a microformat root class name | |||
* a class name represents a microformat property name | |||
* a class name represents a microformat property that needs special parsing (specific type of property). | |||
In particular - derived from the real world examples of existing proven microformats (rather than any abstraction of what a schema should have) | |||
* '''"h-*" for root class names''', e.g. "h-card", "h-event", "h-entry" | |||
** The 'h-' prefix is based on the existing microformats naming pattern of starting with 'h'. | |||
* '''"p-*" for simple (text) properties''', e.g. "p-fn", "p-summary" | |||
** vocabulary generic parsing, element text in general, treat certain HTML element/attribute combination as special and use those first, e.g. img/alt, abbr/title. | |||
** The 'p-' prefix is based on the word "property" starting with 'p'. | |||
* '''"u-*" for URL properties''', e.g. "u-url", "u-photo", "u-logo" | |||
** special parsing required: prefer a/href, img/src, object/data etc. attributes to element contents. | |||
** The 'u-' prefix is based on URL/URI starting with the letter 'u', which is the type of most of these related properties. | |||
* '''"dt-*" for datetime properties''', e.g. "dt-start", "dt-end", "dt-bday" | |||
** special parsing required: [[value-class-pattern]], in particular separate date time value parsing for better human readabillity / DRY balance. | |||
** The 'dt-' prefix is based on "date time" having the initials "dt" and the preponderance of existing date time properties starting with "dt", e.g. dtstart, dtend, dtstamp, dtreviewed. | |||
*** Initially I had proposed "dt-*" but Chris Messina suggested reducing it to "d-*" so that all prefixes were a single letter - made sense. | |||
*** However, I've noticed that Google+ is using "d-*" class names on [https://plus.google.com/109182513536739786206 profile pages], thus we can't really use 'd-' as a microformats 2 property parsing prefix. [[User:Tantek|Tantek]] 03:00, 22 July 2011 (UTC) | |||
* '''"e-*" for element tree properties''' where the entire contained element hierarchy is the value, e.g. "e-content" (formerly "entry-content") for [[hAtom]]. The 'e-' prefix can also be mnemonically remembered as "element tree", "embedded markup", or "encapsulated markup". | |||
** special parsing required: follow the [http://www.whatwg.org/specs/web-apps/current-work/multipage/the-end.html#serializing-html-fragments HTML spec: Serializing HTML Fragments algorithm] to create a serialization. | |||
This provides a simpler transition/education story for existing microformats authors/publishers: | |||
* "h*" to "h-*", "dt*" to "dt-*", url-like properties to "u-*", entire embedded markup to "e-*", and "p-*" for all "plain text" properties. | |||
As part of microformats2 we would immediately define root class names and property names for all existing microformats and drafts consistent with this naming convention, and require support thereof from all new implementations, as well as strongly encouraging existing implementations to adopt the simplified microformats2 syntax and mechanism. Question: which microformats deserve explicit backward compatibility? | |||
As a community we would continue to use the microformats [[process]] both for researching and determining the need for new microformats, and for naming new microformat property names for maximum re-use and interoperability of a shared vocabulary. | |||
If it turns out we need a new property type in the future, we can use one of the remaining single-letter-prefixes to add it to microformats 2.0. This would require updating of parsers of course, but in practice the number of different types of properties has grown very slowly, and we know from other schema/programming languages that there's always some small limited number of scalar/atomic property types that you need, and using those you can create compound types/objects that represent richer / more complicated types of data. | |||
==== ADVANTAGES ==== | |||
This has numerous advantages: | |||
* '''better maintainability''' - much more obvious to web authors/designers/publishers which class names are for/from microformats. | |||
* '''no chance of collision''' - for all practical purposes with existing class names and thus avoiding any need to add more complex CSS style rules to prevent unintended styling effects. | |||
* '''simpler parsing''' - parsers can now do a simple stream-parse (or in-order DOM tree walk) and parse out '''all''' microformat objects, properties, and values, without having to know anything about any specific microformats. | |||
* '''separation of syntax and vocabulary''' - by abstracting microformats 2 syntax independent of any vocabulary, it allows and encourages development of shared vocabularies that can work in alternative syntaxes. | |||
==== prefixes for future consideration ==== | |||
possibly also: | |||
<div class="discussion"> | |||
* '''"s-*" for structured properties''' basically s-* works just like h-* except that no properties (e.g. 'name','url','photo') are implied when there are no properties present. Example uses: s-geo and s-adr. This is being considered as a result of [[microformats-2-parsing#Parsing_Literal_Values|microformats 2 parsing discussions]]. We can try it and see what happens. There's also no harm if publishers just use "h-" structures, they just (possibly) get a few extra properties if they happen to omit properties. | |||
** -1. Although I do think the implied-literal parsing behaviour carries some complications and parsed-cruft with more 'structural' formats, introducing a further generic prefix to differentiate one kind of format from another is oesoteric, won't be understood by authors (we're already considering a full reversal rename of 'fn' to 'name' in response to user comprehension), and we've seen historically that mixed prefixes (v and h) also cause muddle. --[[User:BenWard|BenWard]] 06:30, 5 October 2011 (UTC) | |||
*** This is good reasoning, especially the comparison to v vs. h prefixed root class names (my experience with authors concurs with that). Each new prefix introduces complexity and thus must have advantages sufficient to exceed the complexity cost. [[User:Tantek|Tantek]] 06:55, 5 October 2011 (UTC) | |||
** Furthermore, both of the examples given here have in-the-wild use cases for literal parsing: Geo's existing documented optimisation of <code>1.233;0.453</code> is applied in two (one valid) manners with the <code>abbr</code> element (incorrectly as an expansion of a place name, and as an alternative to degree-format co-ordinates. Coordinates are also displayed in-place alongside map references, markers, and the like.) In the case of <code>adr</code>, there's an overlap with existing uses of the <code>label</code> label property, which is used in cases of unstructured addresses (common in most social network profile systems, also in vcalendar.) Based on previous discussion around unstructured addresses, Twitter uses <code><* class="adr"><* class="label"></code> on profiles ([https://twitter.com/intent/user?screen_name=benward&detailed Example].) Having literal parsing of <code>adr</code> would be neater though, if <code>label</code> were to be deprecated. --[[User:BenWard|BenWard]] 06:30, 5 October 2011 (UTC) | |||
*** [[geo]] has been historically quite problematic in practice, despite our efforts at making it work better via optimizations. I'd like to see real world examples of "Coordinates are also displayed in-place alongside map references, markers, and the like" documented on a page like [[geo-examples]] so we can see how any kind of geo-markup could/would help. Interesting about the Twitter use of 'label' inside 'adr' - quite prescient as [[vCard4]] moved 'label' from being its own top-level property to being an attribute (what we used to call subproperty) of 'adr'. Thus we should consider adding 'p-label' as a property for 'h-adr', given a) Twitter's real world usage, b) the refactoring of label into adr in vCard4. [[User:Tantek|Tantek]] 06:55, 5 October 2011 (UTC) | |||
* '''"e-*" for properties''' where the entire contained element hierarchy is the value, e.g. "e-content" (formerly "entry-content") for [[hAtom]]. (2011-09-21 note: this has now been included in microformats 2, the below discussion is kept for posterity [[User:Tantek|Tantek]] 09:44, 21 September 2011 (UTC)) | |||
** unclear if this is necessary in general. and if so, if this is only for hAtom, that's insufficient to justify putting it in the generic syntax. | |||
** Would be sufficient to have all <code>p-</code> properties parse the complete content, including any nested mark-up if present, and then implementations to sanitize/run a <code>strip_tags</code> style function (as per data input best practice) as appropriate. --[[User:BenWard|BenWard]] 22:05, 19 September 2011 (UTC) | |||
** Ouch that sounds like passing on complexity downstream for all cases just to handle *one* known use-case so far. Since we'd typically do the opposite (simplify for the 99% case over the 1% case), passing on content including mark-up by default seems like a step backwards. Also, given how many vulnerabilities seem to deal with parsing/filtering, doing that *first* rather than burdening downstream implementations seems like the right choice. I'd rather wait til we get a concrete complaint from a microformats-2 hAtom consumer before worrying about this for hAtom 2.0. Or are there other current real world use cases besides Atom? [[User:Tantek|Tantek]] 22:22, 19 September 2011 (UTC) | |||
*** In addition to <code>entry-content</code> in hAtom there's also <code>entry-summary</code>, plus any large region of text in other microformats, which at the very least can commonly contain additional hyperlinks, images, and phrasing mark-up: <code>description</code> in hCal, hReview, hProduct, hListing, <code>note</code> in hCard, <code>ingredient</code>, <code>instructions</code> in hRecipe may link to a store, or wikipedia entry for the ingredient, or include an image to illustrate a step of a recipe, as well as recipe instructions that are expressed as lists. If an author/publisher marks up a property in such a way that it contains further mark-up, that mark-up should be assumed to be part of the value. It's always going to be up to an implementation to decide whether it wishes to translate that HTML mark-up into some other format (e.g. Markdown-esque text annotation when converting to something like <code>vcard</code>, or some other kind of formatting language on non-HTML platforms, or stripping text altogether. —[[User:BenWard|BenWard]] 01:05, 20 September 2011 (UTC) | |||
**** While hAtom's <code>entry-content</code> and possibly hCard's <code>note</code> may be the only existing practical use-cases (<code>entry-summary</code> and other "large region of text" are potential/prospective use cases), I'm now convinced the hAtom use-case alone is worthy of including the "e-" prefix because it enables a full fidelity replacement for typical Atom use cases. [[User:Tantek|Tantek]] 09:44, 21 September 2011 (UTC) | |||
* '''"i-*" for ID properties''', e.g. "i-uid" (if this is the only one, then perhaps we just always re-use "uid" or collapse with "u-*" into "u-id".) | |||
** parsing is no different than "u-*" parsing, thus no need to introduce for now. | |||
* '''"n-*" for numbers''', e.g. "n-rating", "n-geo", where the numbers may have different human-readable-friendly and decimal/machine values (e.g. with geo lat/long degrees minutes seconds vs decimal). | |||
** requires definition of how would different parsing work before worthy of consideration. | |||
* '''"t-*" for time duration''', e.g. "t-duration" in [[hCalendar]], [[hAudio]], [[hRecipe]] (note also Google's hRecipe extensions "preptime", "cooktime", "totaltime") | |||
** requires definition of how would different parsing work before worthy of consideration. | |||
** now that the HTML5 <time> element supports representing durations, we should simply incorporate duration (and timezone at that) into the 'dt-' datetime parsing rules. Certainly no need for a separate prefix. [[User:Tantek|Tantek]] 08:45, 1 December 2011 (UTC) | |||
</div> | |||
==== reserving other prefixes ==== | |||
We should '''reserve all other single-letter-dash prefixes for future use''' (within the scope of '''h-''' objects: outside of the context of an '''h-''' object, this is inapplicable). | |||
In practice we have seen little (if any) use of single-letter-dash prefixing of class names by web developers/designers, and thus in practice we think this will have little if any impact/collisions. Certainly far fewer than existing generic microformat property class names like "title", "note", "summary". | |||
==== existing single letter class prefixes ==== | |||
We should document existing usage of single/double letter prefixed names: | |||
* Google+ (e.g. [https://plus.google.com/109182513536739786206 profile page], others) uses: | |||
** '''a-''' | |||
** '''d-''' | |||
** '''g-''', '''gb*''' | |||
* [http://getskeleton.com/#utilities Skeleton] uses | |||
** '''u-''' for utility classes u-full-width u-max-full-width u-pull-right u-pull-left and u-cf | |||
* [https://github.com/suitcss/utils SUITCSS] uses | |||
** '''u-''' for utility classes. Usually two words written in camelCase: u-alignTop u-floatLeft however some use abbreviations: u-nbfc (new block formatting context) u-cf (clearfix) | |||
* Yahoo | |||
** '''y-''' | |||
* others? please add alphabetical by company/org name. | |||
== see also == | == see also == | ||
* [[microformats-2]] | * [[microformats-2]] | ||
* [[microformats-2-brainstorming]] | * [[microformats-2-brainstorming]] | ||
* [[microformats-2-faq]] | * [[microformats-2-faq]] |
Latest revision as of 15:32, 17 August 2021
microformats2 uses a small number of prefixes to distinguish microformats2 class names from other class names.
microformats2 prefixes
naming conventions for generic parsing
The naming conventions for microformats class names make it obvious when:
- a class name represents a microformat root class name
- a class name represents a microformat property name
- a class name represents a microformat property that needs special parsing
In particular:
- "h-*" for root class names, e.g. "h-card", "h-event", "h-entry"
- "p-*" for plain (text) properties, e.g. "p-name", "p-summary"
- generic plain text parsing, element text in general, certain HTML elements use special attributes first, e.g. img/alt, abbr/title.
- "u-*" for URL properties, e.g. "u-url", "u-photo", "u-logo"
- special parsing required: prefer a/href, img/src, object/data etc. attributes to element contents.
- "dt-*" for datetime properties, e.g. "dt-start", "dt-end", "dt-bday"
- special parsing required: value-class-pattern and separate date time value parsing for readability
- "e-*" for element tree properties where the entire contained element hierarchy is the value, e.g. "e-content" (formerly "entry-content") for hAtom. The 'e-' prefix can also be mnemonically remembered as "element tree", "embedded markup", or "encapsulated markup".
microformats2 examples
Example: simple heading h-card example:
<h1 class="h-card">Chris Messina</h1>
More examples: here is that same heading example with name components:
<h1 class="h-card">
<span class="p-given-name">Chris</span>
<abbr class="p-additional-name">R.</abbr>
<span class="p-family-name">Messina</span>
</h1>
with a hyperlink to Chris's URL:
<h1 class="h-card">
<a class="u-url" href="http://factoryjoe.com/">
<span class="p-given-name">Chris</span>
<abbr class="p-additional-name">R.</abbr>
<span class="p-family-name">Messina</span>
</a>
</h1>
backwards compatibility
microformats2 provides backwards compatibility by enabling content authors to markup with both old and new class names for compatibility with old tools.
Here is a simple backcompat example:
<h1 class="h-card vcard">
<span class="fn">Chris Messina</span>
</h1>
A microformats2 parser would see the class name "h-card" and imply the one required property from the contents, while a microformats 1.0 parser would find the class name "vcard" and then look for the class name "fn". no data duplication is required. this is a very important continuing application of the DRY principle.
And the above hyperlinked example with both sets of class names:
<h1 class="h-card vcard">
<a class="u-url n fn url" href="http://factoryjoe.com/">
<span class="p-given-name given-name">Chris</span>
<abbr class="p-additional-name additional-name">R.</abbr>
<span class="p-family-name family-name">Messina</span>
</a>
</h1>
vendor extensions
Proprietary extensions to formats have typically been shortlived experimental failures with one big recent exception.
Proprietary or experimental CSS3 property implementations have been very successful.
There has been much use of border radius properties and animations/transitions which use CSS properties with vendor-specific prefixes like:
- -moz-border-radius
- -webkit-border-radius
etc.
Note that these are merely string prefixes, not bound to any URL, and thus not namespaces in any practical sense of the word. This is quite an important distinction, as avoiding the need to bind to a URL has made them easier to support and use.
This use of vendor specific CSS properties has in recent years allowed the larger web design/development/implementer communities to experiment and iterate on new CSS features while the features were being developed and standardized.
The benefits have been two-fold:
- designers have been able to make more attractive sites sooner (at least in some browsers)
- features have been market / real-world tested before being fully standardized, thus resulting in better features
Implementers have used/introduced "x-" prefixes for IETF MIME/content-types for experimental content-types, MIME parameter extensions, and HTTP header extensions, per RFC 2045 Section 6.3, RFC 3798 section 3.3, and Wikipedia: HTTP header fields - non-standard headers (could use RFC reference instead) respectively, like:
- application/x-latex (per Wikipedia Internet media type: Type x)
- x-spam-score (in email headers)
- X-Pingback (per Wikipedia:Pingback)
Some standard types started as experimental "x-" types, thus demonstrating this experiment first, standardize later approach has worked for at least some cases:
- image/x-png (standardized as image/png, both per RFC2083)
There have been times when specific sites have wanted to extend microformats beyond what the set of properties in the microformat, and currently lack any experimental way to do so - to try and see if a feature (or even a whole format) is interesting in the real world before bothering to pursue researching and walking it through the microformats process. Thus:
- '*-x-' + '-' + meaningful name for root and property class names
- where "*" indicates the single-character-prefix as defined above
- where "x" indicates a literal 'x' for an experimental extension OR
- OR "x" indicates a vendor prefix (more than one character, e.g. like CSS vendor extension abbreviations, or some stock symbols, avoiding first words/phrases/abbreviations of microformats properties like dt-)
- e.g.
- "h-bigco-one-ring" - a hypothetical "bigco" vendor-specific "onering" microformat root class name.
- "p-goog-preptime" - to represent Google's "preptime" property extension to hRecipe (aside: "duration" may be another property type to consider separate from "datetime" as it may be subject to different parsing rules.)
- "p-x-prep-time" - a possible experimental property name to be added to hRecipe upon consideration/documentation of real-world usage/uptake.
Background - this proposal is a composition of the following (at least somewhat) successful vendor extension syntaxes
- CSS 2.1 4.1.2.1 Vendor-specific extensions
- IETF MIME/content-type "x-*" extensions per RFC 2045 Section 6.3. [1]
- IETF MIME experimental fields (e.g. x-spam-score)
- HTTP header extensions (e.g. x-pingback)
- note also some critical thoughts from mnot
TO DO
- move resolved issues to a separate page.
- clean-up and move background research to a microformats2-background or history page
issues
Hungarian prefixing issues
Raised by BenWard 01:16, 11 April 2011 (UTC)
Microformats 2.0 proposes using an explicit [a-z]-
prefix on properties, to differentiate them from other uses of the class attribute, and identify them as microformat properties, such that they can be parsed generically.
- The differentiation use case is supported by anecdotal evidence of sites (such as Facebook, Twitter, Yahoo) removing microformats or breaking objects in page edits. The addition of a prefix assists self-documentation of code.
- The generic parsing use case is supported by Google Rich Snippets, Yahoo Search Monkey, and extensible plugins like Operator and the Firefox microformats parser. Although these extract microformats from the page, they are intermediate systems between the page content and the actual interpretation of the data. They need to parse all objects from a page, and then another developer or application will interpret some of them into something else.
(Note: the theoretical assertion "they need to parse all objects from a page" is not actually backed by *any* existing use of microformats/microdata/RDFa parsing - *none* of those parse "all objects from a page" if you consider every markup element an "object" - rather, one of the strength of microformats (mimicked by the others) is that the publisher is able to markup *just* the data to be extracted, rather than perhaps purely "presentational" content, ads, UI widgets etc. -- Tantek 02:15, 11 April 2011 (UTC) )
The µf2 proposal goes further, though, into a small vocabulary of Hungarian prefixes of properties based on data type. This increases the level of understanding required to read microformats, and reduces the benefit of all microformat properties having a consistent identifying prefix.
(Debatable assertion:"increases the level of understanding required to read microformats" - how? In microformats 2.0, authors/developers know that any single-letter-and-hyphen prefixed class name is for microformats 2.0, in contrast to today - developers have consistently given feedback that's hard to tell which generic class names (other than h* names) are microformat related and which are not. As for specific prefixes, "h-*" is special and follows the pattern of existing microformats. p = generic (p)roperty, and the other prefixes have trivial mnemonics as well, d for (d)atetimes etc. (so far, hopefully we can keep that up). -- Tantek 02:15, 11 April 2011 (UTC) )
Hungarian notation itself is controversial amongst programmers. Plenty find it uglifies their code, can be a cause of confusion (especially when very-short prefixes are used, or esoteric types, or where the declared set of types differs from the available types in other programming languages.) Others support its benefits to type identification.
(Programmers are not the priority here, rather, designers/authors/publishers are. We design microformats for them first as they're the common use case, and we should avoid making statements that seem to imply any priority for the aesthetic preferences of programmers. -- Tantek 02:15, 11 April 2011 (UTC))
Critically, however, there is no clear indication that either of the above use cases requires types to be strongly identified.
- For identifying µf in pages, a differentiator is required from regular classnames. There is no evidence of further requirement to differentiate between properties beyond their name (and existing criticisms of Hungarian notation suggest it can harm understandability.)
- There is such evidence, and perhaps thus this would be a good FAQ topic. The derivation is quite simple - it comes directly from minimally affecting existing markup, and maximally using existing semantic information. Example of special purpose parsing, URL-like properties use the value of the 'href' (or equivalent) attribute because that's where that data already is in pages. Similarly with dates and datetimes - special parsing rules for that data type have permitted us to design the value-class-pattern to take advantage of specially parsing date and time separation. By re-using data *where publishers already put it, including attributes vs inline* we minimize the risk of data drift. -- Tantek 02:15, 11 April 2011 (UTC)
- Additionally, this special type-specific parsing of microformats properties conveys microformats advantages of markup brevity that other syntaxes lack. E.g. you can convey *multiple* properties and values from a single existing element, e.g. the *very* common real-world pattern
is minimally marked up as
<a href="http://example.com/user">User Name</a>
<span class="h-card"><a class="p-name u-url" href="http://example.com/user">User Name</a></span>
- For generic parsing, there is no requirement that datatypes be established at extraction time. Data types will instead be applied by the developers of apps and widgets that build on the generic parsers.
- There are requirements based on experience with actual markup. In order to support the patterns of where content publishers put the data we want to extract, we have determined (based on those publishing patterns) a few different ways (types) of parsing this data. This is all captured in the hcard-parsing property-specific parsing rules each of which were added one at a time as Brian Suda and myself encountered real world sites wanting to use hCard but not wanting to have to rewrite their markup (adding one span and some class names was about the limit, moving tags/attributes around was a showstopper in many/most cases), and each of the microformats 2.0 "types" are directly derived from such special purpose data/type parsing across *multiple* microformats. -- Tantek 02:15, 11 April 2011 (UTC)
- A counter argument may be that special properties in microformats—such as URLs, or images—need to be identified because in microformats it is common to parse an attribute (href, or src) rather than inner text of an element for these properties. However, in the context of extracting and then interpreting HTML in other contexts this is insufficient: For example, though an image only exists as a single property in vcard, in HTML it is both a URL to a resource and and text string (alt) representing an accessible fallback. A ‘generic extracter’ of microformats from a page must capture all of this information from HTML, so that the interpreting application can choose which data type is most relevant to its context. Likewise, an application interpreting a URL may also consider using the original inner text as an inferred label. Both pieces of data are useful, and a generic parser should not discard elemental semantics at the extraction level.
- It's not just "*common* to parse an attribute rather than inner text of an element for these properties" - it is the vast overwhelming majority - if not all - such cases!
- One misconception: "image only exists as a single property". No, there is both 'photo' and 'logo'. The 'url' and 'sound' properties are also of type 'url'. For all of these, when parsing an "object" element, you must use the 'data' attribute first for example. hCalendar has "attachment" as well. Etc.
- Theoretical assertion: "A ‘generic extracter’ of microformats from a page must capture all of this information from HTML, so that the interpreting application can choose which data type is most relevant to its context." Why? There is no existing nor demonstrated use case for this requirement, even across other formats. While I agree it "might be nice" to develop a new "structured image" type - that's brand new work (deserving of research per the process etc.), and not a good source of reasoning to reject existing working patterns. I reject blocking microformats 2.0 on an as-yet-to-be-researched-enhancement. This is certainly a case where "better" is an enemy of the good.
- Theoretical assertion: "a generic parser should not discard elemental semantics at the extraction level" - already does for other syntaxes like both microdata and RDFa - so clearly this is not a reasonable "should not" assertion (and thus unnecessary) for development of a minimally competitive syntax. RDFa kind of cheats by overloading the 'rel' attribute in attempt to solve the name+url case as mentioned above, but that's only two types - and existing real world use of microformats has demonstrated utility of a few more. -- Tantek 02:15, 11 April 2011 (UTC)
Given this, hungarian prefixes are of no benefit to parsers (and may in fact harm applications down the chain if parsing is prematurely strict.) It would be sufficient then not to concern embedding data types in property names, and instead settle on one single property prefix to differentiate all properties consistently. This would reduce the prefixes to just 3:
h
would indicates a root class name. An ‘object in HTML’.p
would indicates a property within an object.x
would indicates an experimental extension to an object.
--BenWard 01:16, 11 April 2011 (UTC)
The primary benefit of type-specific parsing is *not* for parsers, but rather, publishers (who we still hold in higher priority than parsers).
I will also note that *each* of the type-specific parsing methods in hcard-parsing was added both conservatively, reluctantly, and only when it became clear that such type-specific publishing patterns existing across *multiple* sites that would otherwise be unable to change their markup to work with microformats (Yes, I'm wishing now that I better documented exactly *which* sites, precisely *when*, but like many startups, early on we didn't exactly know how much to document vs get things done - frankly I think we documented far more than any other comparable such efforts, e.g. we managed to at least capture/grow both an explicit process and principles in *far* greater detail than anything remotely comparable either before microformats or since!). The type-specific parsing features are certainly not overdesigned, on the contrary they've *slowly* evolved, adapting to real world data on the web.
While per the simplicity principle, I would actually *strongly* prefer to only have the three prefixes given above, or actually just *two* (I started with just two for the design of microformats 2.0 actually, just "h-*" and "p-*"), doing so would be a step *backwards* in terms of the adaptability of microformats to existing markup, and that's IMHO an unacceptable barrier, and a sufficiently high barrier to hurt the adoption/applicability of microformats 2.0.
(Aside: In addition, note that you still need h-x-* for experimental objects, and thus it's *simpler* to simply have *both* h-x-* and p-x-* rather than add x-*. Alternatively x-h-* and x-p-* is no better, in some ways worse, in that object vs. property is a more important distinction for parsers than established vs experimental, especially if/when an experimental property (or object) may be adopted. Also, mild precdent: PNG started with image/x-png, not x-image/png.).
To put it in a positive way, type-specific parsing conveys microformats a publisher-markup-density (and re-use) advantage which neither microdata nor RDFa have, and it would behoove us to *keep* this significant real-world advantage as we evolve microformats.
-- Tantek 02:15, 11 April 2011 (UTC)
motivating causes
As described on the microformats 2 page.
distinguishing properties from other classes
Current microformats properties re-use generic terms like "summary", "photo", "updated" both for ease of use and understanding.
However, through longer term experience, we've seen sites that accidentally drop (or break) their microformats support (e.g. Upcoming.org, Facebook) because web authors sometimes rewrite all their class names, and either are unaware that microformats were in the page, or couldn't easily distinguish microformats property class names from other site-specific class names.
This issue has been reported by a number of web authors:
- Wim's comment on HTML5Doctor "Authors use classes like 'url' or 'region' all the time ... All sorts of markup might look like a microformat."
- ...
There has also been an anecdotal report of a design firm who was not (yet) familiar with microformats seeing the "extra" classes that "that don't seem to be used" (without corresponding CSS rules) and asking if they "can remove them". By making microformats class names different from generic words, authors unfamiliar with microformats may at least notice such distinction and infer special functionality accordingly.
Thus microformats 2 uses prefixes for property class names, e.g.:
- p-summary instead of summary
- u-photo instead of photo
- dt-updated instead of updated
Such prefixing of all microformats class names was first suggested by Scott Isaacs of Microsoft to Tantek on a visit to Microsoft sometime in 2006/2007, but specifically aimed at making microformats easier to parse. At the time the suggestion was rejected since microformats were focused on web authors rather than parsers.
However, since experience has shown that distinguishing property class names is an issue for both web authors and parser developers, this is a key change that microformats 2 is adopting. See the next section for details.
existing microformats parsing requirements
A non-trivial number of parser and tools developers have been sufficiently frustrated with some general issues with microformats that they've done the significant extra work to support very different and less friendly alternatives (microdata, RDFa). Based on this real-world data (market behavior), it behooves us to address these general issues with microformats for this constituency.
COMMUNITY and TOOLS (that) USE MICROFORMATS
- parser / parsing
- structured
- getting the data out
- json - 1:1 mapping
parsing microformats currently requires
- a list of root class names of each microformat to be parsed
- a list of properties for each specific microformats, along with knowledge of the type of each property in order to parse their data from potentially different portions of the HTML markup
- some number of format-specific specific rules (markup/content optimizations)
This has meant that whenever a new microformat is drafted/specificied/adopted, parsers need to updated to handle it correctly, at a minimum to parse them when inside other microformats and avoid errantly implying properties from one to the other (containment, mfo problem).
naming conventions for generic parsing
There is a fairly simple solution to #1 and #2 from the above list, and we can make progress towards minimizing #3. In short:
Proposal: a set of naming conventions for microformat root class names and properties that make it obvious when:
- a class name represents a microformat root class name
- a class name represents a microformat property name
- a class name represents a microformat property that needs special parsing (specific type of property).
In particular - derived from the real world examples of existing proven microformats (rather than any abstraction of what a schema should have)
- "h-*" for root class names, e.g. "h-card", "h-event", "h-entry"
- The 'h-' prefix is based on the existing microformats naming pattern of starting with 'h'.
- "p-*" for simple (text) properties, e.g. "p-fn", "p-summary"
- vocabulary generic parsing, element text in general, treat certain HTML element/attribute combination as special and use those first, e.g. img/alt, abbr/title.
- The 'p-' prefix is based on the word "property" starting with 'p'.
- "u-*" for URL properties, e.g. "u-url", "u-photo", "u-logo"
- special parsing required: prefer a/href, img/src, object/data etc. attributes to element contents.
- The 'u-' prefix is based on URL/URI starting with the letter 'u', which is the type of most of these related properties.
- "dt-*" for datetime properties, e.g. "dt-start", "dt-end", "dt-bday"
- special parsing required: value-class-pattern, in particular separate date time value parsing for better human readabillity / DRY balance.
- The 'dt-' prefix is based on "date time" having the initials "dt" and the preponderance of existing date time properties starting with "dt", e.g. dtstart, dtend, dtstamp, dtreviewed.
- Initially I had proposed "dt-*" but Chris Messina suggested reducing it to "d-*" so that all prefixes were a single letter - made sense.
- However, I've noticed that Google+ is using "d-*" class names on profile pages, thus we can't really use 'd-' as a microformats 2 property parsing prefix. Tantek 03:00, 22 July 2011 (UTC)
- "e-*" for element tree properties where the entire contained element hierarchy is the value, e.g. "e-content" (formerly "entry-content") for hAtom. The 'e-' prefix can also be mnemonically remembered as "element tree", "embedded markup", or "encapsulated markup".
- special parsing required: follow the HTML spec: Serializing HTML Fragments algorithm to create a serialization.
This provides a simpler transition/education story for existing microformats authors/publishers:
- "h*" to "h-*", "dt*" to "dt-*", url-like properties to "u-*", entire embedded markup to "e-*", and "p-*" for all "plain text" properties.
As part of microformats2 we would immediately define root class names and property names for all existing microformats and drafts consistent with this naming convention, and require support thereof from all new implementations, as well as strongly encouraging existing implementations to adopt the simplified microformats2 syntax and mechanism. Question: which microformats deserve explicit backward compatibility?
As a community we would continue to use the microformats process both for researching and determining the need for new microformats, and for naming new microformat property names for maximum re-use and interoperability of a shared vocabulary.
If it turns out we need a new property type in the future, we can use one of the remaining single-letter-prefixes to add it to microformats 2.0. This would require updating of parsers of course, but in practice the number of different types of properties has grown very slowly, and we know from other schema/programming languages that there's always some small limited number of scalar/atomic property types that you need, and using those you can create compound types/objects that represent richer / more complicated types of data.
ADVANTAGES
This has numerous advantages:
- better maintainability - much more obvious to web authors/designers/publishers which class names are for/from microformats.
- no chance of collision - for all practical purposes with existing class names and thus avoiding any need to add more complex CSS style rules to prevent unintended styling effects.
- simpler parsing - parsers can now do a simple stream-parse (or in-order DOM tree walk) and parse out all microformat objects, properties, and values, without having to know anything about any specific microformats.
- separation of syntax and vocabulary - by abstracting microformats 2 syntax independent of any vocabulary, it allows and encourages development of shared vocabularies that can work in alternative syntaxes.
prefixes for future consideration
possibly also:
- "s-*" for structured properties basically s-* works just like h-* except that no properties (e.g. 'name','url','photo') are implied when there are no properties present. Example uses: s-geo and s-adr. This is being considered as a result of microformats 2 parsing discussions. We can try it and see what happens. There's also no harm if publishers just use "h-" structures, they just (possibly) get a few extra properties if they happen to omit properties.
- -1. Although I do think the implied-literal parsing behaviour carries some complications and parsed-cruft with more 'structural' formats, introducing a further generic prefix to differentiate one kind of format from another is oesoteric, won't be understood by authors (we're already considering a full reversal rename of 'fn' to 'name' in response to user comprehension), and we've seen historically that mixed prefixes (v and h) also cause muddle. --BenWard 06:30, 5 October 2011 (UTC)
- This is good reasoning, especially the comparison to v vs. h prefixed root class names (my experience with authors concurs with that). Each new prefix introduces complexity and thus must have advantages sufficient to exceed the complexity cost. Tantek 06:55, 5 October 2011 (UTC)
- Furthermore, both of the examples given here have in-the-wild use cases for literal parsing: Geo's existing documented optimisation of
1.233;0.453
is applied in two (one valid) manners with theabbr
element (incorrectly as an expansion of a place name, and as an alternative to degree-format co-ordinates. Coordinates are also displayed in-place alongside map references, markers, and the like.) In the case ofadr
, there's an overlap with existing uses of thelabel
label property, which is used in cases of unstructured addresses (common in most social network profile systems, also in vcalendar.) Based on previous discussion around unstructured addresses, Twitter uses<* class="adr"><* class="label">
on profiles (Example.) Having literal parsing ofadr
would be neater though, iflabel
were to be deprecated. --BenWard 06:30, 5 October 2011 (UTC)- geo has been historically quite problematic in practice, despite our efforts at making it work better via optimizations. I'd like to see real world examples of "Coordinates are also displayed in-place alongside map references, markers, and the like" documented on a page like geo-examples so we can see how any kind of geo-markup could/would help. Interesting about the Twitter use of 'label' inside 'adr' - quite prescient as vCard4 moved 'label' from being its own top-level property to being an attribute (what we used to call subproperty) of 'adr'. Thus we should consider adding 'p-label' as a property for 'h-adr', given a) Twitter's real world usage, b) the refactoring of label into adr in vCard4. Tantek 06:55, 5 October 2011 (UTC)
- -1. Although I do think the implied-literal parsing behaviour carries some complications and parsed-cruft with more 'structural' formats, introducing a further generic prefix to differentiate one kind of format from another is oesoteric, won't be understood by authors (we're already considering a full reversal rename of 'fn' to 'name' in response to user comprehension), and we've seen historically that mixed prefixes (v and h) also cause muddle. --BenWard 06:30, 5 October 2011 (UTC)
- "e-*" for properties where the entire contained element hierarchy is the value, e.g. "e-content" (formerly "entry-content") for hAtom. (2011-09-21 note: this has now been included in microformats 2, the below discussion is kept for posterity Tantek 09:44, 21 September 2011 (UTC))
- unclear if this is necessary in general. and if so, if this is only for hAtom, that's insufficient to justify putting it in the generic syntax.
- Would be sufficient to have all
p-
properties parse the complete content, including any nested mark-up if present, and then implementations to sanitize/run astrip_tags
style function (as per data input best practice) as appropriate. --BenWard 22:05, 19 September 2011 (UTC) - Ouch that sounds like passing on complexity downstream for all cases just to handle *one* known use-case so far. Since we'd typically do the opposite (simplify for the 99% case over the 1% case), passing on content including mark-up by default seems like a step backwards. Also, given how many vulnerabilities seem to deal with parsing/filtering, doing that *first* rather than burdening downstream implementations seems like the right choice. I'd rather wait til we get a concrete complaint from a microformats-2 hAtom consumer before worrying about this for hAtom 2.0. Or are there other current real world use cases besides Atom? Tantek 22:22, 19 September 2011 (UTC)
- In addition to
entry-content
in hAtom there's alsoentry-summary
, plus any large region of text in other microformats, which at the very least can commonly contain additional hyperlinks, images, and phrasing mark-up:description
in hCal, hReview, hProduct, hListing,note
in hCard,ingredient
,instructions
in hRecipe may link to a store, or wikipedia entry for the ingredient, or include an image to illustrate a step of a recipe, as well as recipe instructions that are expressed as lists. If an author/publisher marks up a property in such a way that it contains further mark-up, that mark-up should be assumed to be part of the value. It's always going to be up to an implementation to decide whether it wishes to translate that HTML mark-up into some other format (e.g. Markdown-esque text annotation when converting to something likevcard
, or some other kind of formatting language on non-HTML platforms, or stripping text altogether. —BenWard 01:05, 20 September 2011 (UTC)- While hAtom's
entry-content
and possibly hCard'snote
may be the only existing practical use-cases (entry-summary
and other "large region of text" are potential/prospective use cases), I'm now convinced the hAtom use-case alone is worthy of including the "e-" prefix because it enables a full fidelity replacement for typical Atom use cases. Tantek 09:44, 21 September 2011 (UTC)
- While hAtom's
- In addition to
- "i-*" for ID properties, e.g. "i-uid" (if this is the only one, then perhaps we just always re-use "uid" or collapse with "u-*" into "u-id".)
- parsing is no different than "u-*" parsing, thus no need to introduce for now.
- "n-*" for numbers, e.g. "n-rating", "n-geo", where the numbers may have different human-readable-friendly and decimal/machine values (e.g. with geo lat/long degrees minutes seconds vs decimal).
- requires definition of how would different parsing work before worthy of consideration.
- "t-*" for time duration, e.g. "t-duration" in hCalendar, hAudio, hRecipe (note also Google's hRecipe extensions "preptime", "cooktime", "totaltime")
- requires definition of how would different parsing work before worthy of consideration.
- now that the HTML5 <time> element supports representing durations, we should simply incorporate duration (and timezone at that) into the 'dt-' datetime parsing rules. Certainly no need for a separate prefix. Tantek 08:45, 1 December 2011 (UTC)
reserving other prefixes
We should reserve all other single-letter-dash prefixes for future use (within the scope of h- objects: outside of the context of an h- object, this is inapplicable).
In practice we have seen little (if any) use of single-letter-dash prefixing of class names by web developers/designers, and thus in practice we think this will have little if any impact/collisions. Certainly far fewer than existing generic microformat property class names like "title", "note", "summary".
existing single letter class prefixes
We should document existing usage of single/double letter prefixed names:
- Google+ (e.g. profile page, others) uses:
- a-
- d-
- g-, gb*
- Skeleton uses
- u- for utility classes u-full-width u-max-full-width u-pull-right u-pull-left and u-cf
- SUITCSS uses
- u- for utility classes. Usually two words written in camelCase: u-alignTop u-floatLeft however some use abbreviations: u-nbfc (new block formatting context) u-cf (clearfix)
- Yahoo
- y-
- others? please add alphabetical by company/org name.