process-faq: Difference between revisions

From Microformats Wiki
Jump to navigation Jump to search
m (update sync)
m (Replace <entry-title> with {{DISPLAYTITLE:}})
 
(3 intermediate revisions by one other user not shown)
Line 1: Line 1:
<h1>microformats process FAQ</h1>
{{DISPLAYTITLE:microformats process FAQ}}
This page is for documenting Q&A about the microformats [[process]].  If you have a new question to ask, please consider first asking your question on the [irc://irc.freenode.net/#microformats microformats irc channel] (preferably) or [http://microformats.org/mailman/listinfo/microformats-discuss/ microformats-discuss] mailing list.  New questions and answers should be added to the end of the list. If you have a new question but not an answer, please add it to [[process-issues]].
This page is for documenting Q&A about the microformats [[process]].  If you have a new question to ask, please consider first asking your question on the [irc://irc.freenode.net/#microformats microformats irc channel] (preferably) or [http://microformats.org/mailman/listinfo/microformats-discuss/ microformats-discuss] mailing list.  New questions and answers should be added to the end of the list. If you have a new question but not an answer, please add it to [[process-issues]].


<h2>Editing this Page</h2>
== Editing this Page ==
Please do not use "?" or other punctuation in the headers - it helps to keep the URLs to their fragment identifiers shorter and easier to read, copy/paste etc.  See [[how-to-play]] for more wiki editing guidelines.
Please do not use "?" or other punctuation in the headers - it helps to keep the URLs to their fragment identifiers shorter and easier to read, copy/paste etc.  See [[how-to-play]] for more wiki editing guidelines.


<h2> Q&A </h2>
== Q&A ==
__TOC__
__TOC__
=== Why waste time wading through flakey HTML ===
=== Why waste time wading through flakey HTML ===
Line 11: Line 11:
* The "gather real world examples" for analysis step of the [[process]] is specifically focusing on the <em>data</em> published, and <strong style="text-transform:uppercase">not</strong> the <em>markup</em> patterns (or lack thereof). This is why the <strong>*-examples</strong> step says: <blockquote><p>"Document the schemas implied by the content examples."</p></blockquote>Every word in that sentence matters. <em>implied</em> schemas, that is, you have to look at the <em>content</em> of the examples and note what abstract notions/fields/properties that people are publishing.  That's very deliberate in that it is <strong style="text-transform:uppercase">much</strong> less important (if at all) what flakey html is being used.<p>Analysis of current publishing practices helps us prioritize what problems are worth solving (i.e. there is already demonstrated incentive for people to publish such information) as opposed to what problems are purely theoretical, or wishful thinking (e.g. if only everyone would publish metadata ABC then we could build applications XYZ).  In fact, this is probably one of the most important parts of the process. Domain models that don't account for what is published on the real web tend to be less useful on the real web as has been demonstrated by the numerous a priori XML formats that have been proposed but never got any adoption.  The XML formats that have gained adoption are those that modeled the data of existing content publishing behaviors (e.g. the Atom format).
* The "gather real world examples" for analysis step of the [[process]] is specifically focusing on the <em>data</em> published, and <strong style="text-transform:uppercase">not</strong> the <em>markup</em> patterns (or lack thereof). This is why the <strong>*-examples</strong> step says: <blockquote><p>"Document the schemas implied by the content examples."</p></blockquote>Every word in that sentence matters. <em>implied</em> schemas, that is, you have to look at the <em>content</em> of the examples and note what abstract notions/fields/properties that people are publishing.  That's very deliberate in that it is <strong style="text-transform:uppercase">much</strong> less important (if at all) what flakey html is being used.<p>Analysis of current publishing practices helps us prioritize what problems are worth solving (i.e. there is already demonstrated incentive for people to publish such information) as opposed to what problems are purely theoretical, or wishful thinking (e.g. if only everyone would publish metadata ABC then we could build applications XYZ).  In fact, this is probably one of the most important parts of the process. Domain models that don't account for what is published on the real web tend to be less useful on the real web as has been demonstrated by the numerous a priori XML formats that have been proposed but never got any adoption.  The XML formats that have gained adoption are those that modeled the data of existing content publishing behaviors (e.g. the Atom format).


<h2>Related</h2>
=== Can a microformat be class names from another format vocabulary ===
'''Can one make a microformat simply by taking the vocabulary of an existing format and using it as a set of class names?''
 
In short, no, unless you're extremely lucky, and even then you will likely have more properties in the microformat than would be justified by sampling actual web content published of the same type of data that the format represents.
 
Longer:
 
Simply taking an existing format (like Dublin Core) and reusing its vocabulary as class names is insufficient to make a microformat.
 
microformats are based first and foremost on existing ''content'' publishing behaviors, not first on existing ''markup'' (see above related FAQ), nor first on existing ''formats''.
 
Per the [[process]], only after existing ''content'' publishing behaviors are documented and implied schema are thus determined (in a *-examples page) does it make sense to document previous attempts at formats for that type of content (in a *-formats page), and look at re-using ''some'' of their vocabulary that maps to the implied schema determined by the documented content publishing patterns (in a *-brainstorming page).
 
=== Where are the real-world publishing examples and analysis of properties implied in hCard and hCalendar ===
 
The obvious follow-up question is of course, what about [[hCard]] and [[hCalendar]] (which are very much defined as being modeled on the set of properties/values in [[vCard]] and [[iCalendar]]) ?
 
The answer to that is a combination of things:
 
1. '''Ubiquitous examples.''' It was clear from the research at the time (2004) that "people" and "events" were two ''very'' common types of content being published on web pages, with names/URLs of people, and summaries/times/locations of events for example.  The examples were so widespread as to be self-evident - nearly every website had/has them.
 
2. '''Widespread vCard and iCalendar implementation.''' There is (and certainly was at the time) widespread overwhelming support of vCard and iCalendar in tools that both produce, and consume that content with user interfaces. Both are also IETF standards developed by primarily by ''implementers'' and based on previous iterations with products in the market made them likely successes (as compared to your average standards format proposal which is more often based mostly on optimistic theory, proposed by non-implementers).
 
3. '''Dominance over alternatives.''' The numbers of products that support the vCard and iCalendar formats greatly outnumbered (as of 2004) support of any other alternative contact or event formats. No other formats were close at all and thus no one even thought to propose or consider alternatives.
 
4. '''Predated and inspired written process.''' Both [[hCard]] and [[hCalendar]] were proposed (2004-09) and drafted long before the first version of the microformats process was written up. In fact, in some ways, hCard and hCalendar were both the first test of the design hypothesis that there are advantages to re-using vocabulary from an existing non-web format in an HTML microformat rather than making up new vocabulary. As such, hCard and hCalendar's reception and success helped shape the microformats [[process]] accordingly.
 
5. '''Lesson learned: better to re-use the subset implied by real-world examples.''' Though hCard and hCalendar are based on the full set of explicit properties and values in vCard and iCalendar respectively, it's been clear from usage experience of both microformats that not all properties are commonly (or potentially ever in some cases) used.  Thus there is likely a case for explicitly dropping some of these properties from hCard (perhaps like the poorly named "class" property), and certainly from hCalendar, and thus providing subsets and making hCard and hCalendar even simpler, per the [[principles]].  Fortunately we had learned this lesson by the time we developed hAtom, and thus hAtom 0.1 is a proper ''subset'' of Atom 1.0.  See [[hcard-brainstorming#deprecate_unused_properties|hCard 1.0.1 brainstorming: deprecate unused properties]].
 
=== How many examples are enough ===
''What constitutes "Enough Examples"?''
 
Some amount of diversity among large sites (e.g. social content hosts), small sites (independent publishers) helps to provide a good amount of research. If you're not sure, ask in [[IRC]] and on the mailing lists.
 
 
== Related ==
* [[process]]
* [[process]]
* [[process-issues]]
* [[process-issues]]
* [[process-brainstorming]]
* [[process-brainstorming]]

Latest revision as of 16:31, 18 July 2020

This page is for documenting Q&A about the microformats process. If you have a new question to ask, please consider first asking your question on the microformats irc channel (preferably) or microformats-discuss mailing list. New questions and answers should be added to the end of the list. If you have a new question but not an answer, please add it to process-issues.

Editing this Page

Please do not use "?" or other punctuation in the headers - it helps to keep the URLs to their fragment identifiers shorter and easier to read, copy/paste etc. See how-to-play for more wiki editing guidelines.

Q&A

Why waste time wading through flakey HTML

Would it not be better for microformats to standardize markup based on the domain model than waste time wading through flakey html?

  • The "gather real world examples" for analysis step of the process is specifically focusing on the data published, and not the markup patterns (or lack thereof). This is why the *-examples step says:

    "Document the schemas implied by the content examples."

    Every word in that sentence matters. implied schemas, that is, you have to look at the content of the examples and note what abstract notions/fields/properties that people are publishing. That's very deliberate in that it is much less important (if at all) what flakey html is being used.

    Analysis of current publishing practices helps us prioritize what problems are worth solving (i.e. there is already demonstrated incentive for people to publish such information) as opposed to what problems are purely theoretical, or wishful thinking (e.g. if only everyone would publish metadata ABC then we could build applications XYZ). In fact, this is probably one of the most important parts of the process. Domain models that don't account for what is published on the real web tend to be less useful on the real web as has been demonstrated by the numerous a priori XML formats that have been proposed but never got any adoption. The XML formats that have gained adoption are those that modeled the data of existing content publishing behaviors (e.g. the Atom format).

Can a microformat be class names from another format vocabulary

'Can one make a microformat simply by taking the vocabulary of an existing format and using it as a set of class names?

In short, no, unless you're extremely lucky, and even then you will likely have more properties in the microformat than would be justified by sampling actual web content published of the same type of data that the format represents.

Longer:

Simply taking an existing format (like Dublin Core) and reusing its vocabulary as class names is insufficient to make a microformat.

microformats are based first and foremost on existing content publishing behaviors, not first on existing markup (see above related FAQ), nor first on existing formats.

Per the process, only after existing content publishing behaviors are documented and implied schema are thus determined (in a *-examples page) does it make sense to document previous attempts at formats for that type of content (in a *-formats page), and look at re-using some of their vocabulary that maps to the implied schema determined by the documented content publishing patterns (in a *-brainstorming page).

Where are the real-world publishing examples and analysis of properties implied in hCard and hCalendar

The obvious follow-up question is of course, what about hCard and hCalendar (which are very much defined as being modeled on the set of properties/values in vCard and iCalendar) ?

The answer to that is a combination of things:

1. Ubiquitous examples. It was clear from the research at the time (2004) that "people" and "events" were two very common types of content being published on web pages, with names/URLs of people, and summaries/times/locations of events for example. The examples were so widespread as to be self-evident - nearly every website had/has them.

2. Widespread vCard and iCalendar implementation. There is (and certainly was at the time) widespread overwhelming support of vCard and iCalendar in tools that both produce, and consume that content with user interfaces. Both are also IETF standards developed by primarily by implementers and based on previous iterations with products in the market made them likely successes (as compared to your average standards format proposal which is more often based mostly on optimistic theory, proposed by non-implementers).

3. Dominance over alternatives. The numbers of products that support the vCard and iCalendar formats greatly outnumbered (as of 2004) support of any other alternative contact or event formats. No other formats were close at all and thus no one even thought to propose or consider alternatives.

4. Predated and inspired written process. Both hCard and hCalendar were proposed (2004-09) and drafted long before the first version of the microformats process was written up. In fact, in some ways, hCard and hCalendar were both the first test of the design hypothesis that there are advantages to re-using vocabulary from an existing non-web format in an HTML microformat rather than making up new vocabulary. As such, hCard and hCalendar's reception and success helped shape the microformats process accordingly.

5. Lesson learned: better to re-use the subset implied by real-world examples. Though hCard and hCalendar are based on the full set of explicit properties and values in vCard and iCalendar respectively, it's been clear from usage experience of both microformats that not all properties are commonly (or potentially ever in some cases) used. Thus there is likely a case for explicitly dropping some of these properties from hCard (perhaps like the poorly named "class" property), and certainly from hCalendar, and thus providing subsets and making hCard and hCalendar even simpler, per the principles. Fortunately we had learned this lesson by the time we developed hAtom, and thus hAtom 0.1 is a proper subset of Atom 1.0. See hCard 1.0.1 brainstorming: deprecate unused properties.

How many examples are enough

What constitutes "Enough Examples"?

Some amount of diversity among large sites (e.g. social content hosts), small sites (independent publishers) helps to provide a good amount of research. If you're not sure, ask in IRC and on the mailing lists.


Related