process-faq: Difference between revisions

From Microformats Wiki
Jump to navigation Jump to search
m (update sync)
(add FAQ regarding microformat be class names from another format vocabulary)
Line 10: Line 10:
''Would it not be better for microformats to standardize markup based on the domain model than waste time wading through flakey html?''
''Would it not be better for microformats to standardize markup based on the domain model than waste time wading through flakey html?''
* The "gather real world examples" for analysis step of the [[process]] is specifically focusing on the <em>data</em> published, and <strong style="text-transform:uppercase">not</strong> the <em>markup</em> patterns (or lack thereof). This is why the <strong>*-examples</strong> step says: <blockquote><p>"Document the schemas implied by the content examples."</p></blockquote>Every word in that sentence matters. <em>implied</em> schemas, that is, you have to look at the <em>content</em> of the examples and note what abstract notions/fields/properties that people are publishing.  That's very deliberate in that it is <strong style="text-transform:uppercase">much</strong> less important (if at all) what flakey html is being used.<p>Analysis of current publishing practices helps us prioritize what problems are worth solving (i.e. there is already demonstrated incentive for people to publish such information) as opposed to what problems are purely theoretical, or wishful thinking (e.g. if only everyone would publish metadata ABC then we could build applications XYZ).  In fact, this is probably one of the most important parts of the process. Domain models that don't account for what is published on the real web tend to be less useful on the real web as has been demonstrated by the numerous a priori XML formats that have been proposed but never got any adoption.  The XML formats that have gained adoption are those that modeled the data of existing content publishing behaviors (e.g. the Atom format).
* The "gather real world examples" for analysis step of the [[process]] is specifically focusing on the <em>data</em> published, and <strong style="text-transform:uppercase">not</strong> the <em>markup</em> patterns (or lack thereof). This is why the <strong>*-examples</strong> step says: <blockquote><p>"Document the schemas implied by the content examples."</p></blockquote>Every word in that sentence matters. <em>implied</em> schemas, that is, you have to look at the <em>content</em> of the examples and note what abstract notions/fields/properties that people are publishing.  That's very deliberate in that it is <strong style="text-transform:uppercase">much</strong> less important (if at all) what flakey html is being used.<p>Analysis of current publishing practices helps us prioritize what problems are worth solving (i.e. there is already demonstrated incentive for people to publish such information) as opposed to what problems are purely theoretical, or wishful thinking (e.g. if only everyone would publish metadata ABC then we could build applications XYZ).  In fact, this is probably one of the most important parts of the process. Domain models that don't account for what is published on the real web tend to be less useful on the real web as has been demonstrated by the numerous a priori XML formats that have been proposed but never got any adoption.  The XML formats that have gained adoption are those that modeled the data of existing content publishing behaviors (e.g. the Atom format).
=== Can a microformat be class names from another format vocabulary ===
'''Can one make a microformat simply by taking the vocabulary of an existing format and using it as a set of class names?''
In short, no, unless you're extremely lucky, and even then you will likely have more properties in the microformat than would be justified by sampling actual web content published of the same type of data that the format represents.
Longer:
Simply taking an existing format (like Dublin Core) and reusing its vocabulary as class names is insufficient to make a microformat.
microformats are based first and foremost on existing ''content'' publishing behaviors, not first on existing ''markup'' (see above related FAQ), nor first on existing ''formats''.
Per the [[process]], only after existing ''content'' publishing behaviors are documented and implied schema are thus determined (in a *-examples page) does it make sense to document previous attempts at formats for that type of content (in a *-formats page), and look at re-using ''some'' of their vocabulary that maps to the implied schema determined by the documented content publishing patterns (in a *-brainstorming page).
The obvious follow-up question is of course, what about [[hCard]] and [[hCalendar]] (which are very much defined as being modeled on the set of properties/values in [[vCard]] and [[iCalendar]]) ?
The answer to that is a combination of things:
1. It was clear from the research at the time (2004) that "people" and "events" were two ''very'' common types of content being publishing on web pages, with names/URLs of people, and summaries/times/locations of events for example.
2. The widespread overwhelming adoption of vCard and iCalendar in tools that both produce, and consume that content with user interfaces, as well as their basis in IETF standards developed by implementers and based on previous iterations with products in the market made them likely successes (as compared to your average format which is made up based mostly on optimistic theory).
3. Though hCard and hCalendar are based on the full set of explicit properties and values in vCard and iCalendar, it's been clear from usage experience of both microformats that not all properties are commonly (or potentially ever in some cases) used.  Thus there is likely a case for explicitly dropping some of these properties from hCard (perhaps like the poorly named "class" property), and certainly from hCalendar, and thus providing subsets and making hCard and hCalendar even simpler, per the [[principles]].  Fortunately we had learned this lesson by the time we developed hAtom, and thus hAtom 0.1 is a proper *subset* of Atom 1.0.


<h2>Related</h2>
<h2>Related</h2>

Revision as of 16:39, 1 February 2008

microformats process FAQ

This page is for documenting Q&A about the microformats process. If you have a new question to ask, please consider first asking your question on the microformats irc channel (preferably) or microformats-discuss mailing list. New questions and answers should be added to the end of the list. If you have a new question but not an answer, please add it to process-issues.

Editing this Page

Please do not use "?" or other punctuation in the headers - it helps to keep the URLs to their fragment identifiers shorter and easier to read, copy/paste etc. See how-to-play for more wiki editing guidelines.

Q&A

Why waste time wading through flakey HTML

Would it not be better for microformats to standardize markup based on the domain model than waste time wading through flakey html?

  • The "gather real world examples" for analysis step of the process is specifically focusing on the data published, and not the markup patterns (or lack thereof). This is why the *-examples step says:

    "Document the schemas implied by the content examples."

    Every word in that sentence matters. implied schemas, that is, you have to look at the content of the examples and note what abstract notions/fields/properties that people are publishing. That's very deliberate in that it is much less important (if at all) what flakey html is being used.

    Analysis of current publishing practices helps us prioritize what problems are worth solving (i.e. there is already demonstrated incentive for people to publish such information) as opposed to what problems are purely theoretical, or wishful thinking (e.g. if only everyone would publish metadata ABC then we could build applications XYZ). In fact, this is probably one of the most important parts of the process. Domain models that don't account for what is published on the real web tend to be less useful on the real web as has been demonstrated by the numerous a priori XML formats that have been proposed but never got any adoption. The XML formats that have gained adoption are those that modeled the data of existing content publishing behaviors (e.g. the Atom format).

Can a microformat be class names from another format vocabulary

'Can one make a microformat simply by taking the vocabulary of an existing format and using it as a set of class names?

In short, no, unless you're extremely lucky, and even then you will likely have more properties in the microformat than would be justified by sampling actual web content published of the same type of data that the format represents.

Longer:

Simply taking an existing format (like Dublin Core) and reusing its vocabulary as class names is insufficient to make a microformat.

microformats are based first and foremost on existing content publishing behaviors, not first on existing markup (see above related FAQ), nor first on existing formats.

Per the process, only after existing content publishing behaviors are documented and implied schema are thus determined (in a *-examples page) does it make sense to document previous attempts at formats for that type of content (in a *-formats page), and look at re-using some of their vocabulary that maps to the implied schema determined by the documented content publishing patterns (in a *-brainstorming page).

The obvious follow-up question is of course, what about hCard and hCalendar (which are very much defined as being modeled on the set of properties/values in vCard and iCalendar) ?

The answer to that is a combination of things:

1. It was clear from the research at the time (2004) that "people" and "events" were two very common types of content being publishing on web pages, with names/URLs of people, and summaries/times/locations of events for example.

2. The widespread overwhelming adoption of vCard and iCalendar in tools that both produce, and consume that content with user interfaces, as well as their basis in IETF standards developed by implementers and based on previous iterations with products in the market made them likely successes (as compared to your average format which is made up based mostly on optimistic theory).

3. Though hCard and hCalendar are based on the full set of explicit properties and values in vCard and iCalendar, it's been clear from usage experience of both microformats that not all properties are commonly (or potentially ever in some cases) used. Thus there is likely a case for explicitly dropping some of these properties from hCard (perhaps like the poorly named "class" property), and certainly from hCalendar, and thus providing subsets and making hCard and hCalendar even simpler, per the principles. Fortunately we had learned this lesson by the time we developed hAtom, and thus hAtom 0.1 is a proper *subset* of Atom 1.0.

Related