hatom: Difference between revisions

From Microformats Wiki
Jump to navigation Jump to search
(Renamed Containment to Opaque)
Line 87: Line 87:
! width="150" | Concept
! width="150" | Concept
! width="150" | Nests In
! width="150" | Nests In
! width="100" | hAtom Containment
! width="120" | hAtom Opaque
! Notes
! Notes
|-
|-
Line 130: Line 130:
Notes:
Notes:


* "hAtom Containment" specifies whether a hAtom parser should "look inside" the element for further hAtom content. If there are multiple rules applied to the same element take the AND of the two (i.e. "No" always wins).
* "hAtom Opaque" specifies whether a hAtom parser should "look inside" the element for further hAtom content. If there are multiple rules applied to the same element take the AND of the two (i.e. "No" always wins).


==== Disambiguation ====
==== Disambiguation ====

Revision as of 14:38, 2 November 2005

hAtom

hAtom is a microformat for content that can be syndicated, primarily but not exclusively weblog postings. hAtom is a strongly based on a subset of the Atom syndication format; every concept in hAtom has a corresponding definition in Atom.

NOTE: the structure is in the process of being constructed, please give me a few days to get it in shape! DavidJanes

Draft Specification

Editor

Authors

Copyright

This specification is (C) 2005-2024 by the authors. However, the authors intend to submit (or already have submitted, see details in the spec) this specification to a standards body with a liberal copyright/licensing policy such as the GMPG, IETF, and/or W3C. Anyone wishing to contribute should read their copyright principles, policies and licenses (e.g. the GMPG Principles) and agree to them, including licensing of all contributions under all required licenses (e.g. CC-by 1.0 and later), before contributing.

Patents

This specification is subject to a royalty free patent policy, e.g. per the W3C Patent Policy, and IETF RFC3667 & RFC3668.

Introduction

Semantic XHTML Design Principles

Note: the Semantic XHTML Design Principles were written primarily within the context of developing hCard and hCalendar, thus it may be easier to understand these principles in the context of the hCard design methodology (i.e. read that first). Tantek

XHTML is built on XML, and thus XHTML based formats can be used not only for convenient display presentation, but also for general purpose data exchange. In many ways, XHTML based formats exemplify the best of both HTML and XML worlds. However, when building XHTML based formats, it helps to have a guiding set of principles.

  1. Reuse the schema (names, objects, properties, values, types, hierarchies, constraints) as much as possible from pre-existing, established, well-supported standards by reference. Avoid restating constraints expressed in the source standard. Informative mentions are ok.
    1. For types with multiple components, use nested elements with class names equivalent to the names of the components.
    2. Plural components are made singular, and thus multiple nested elements are used to represent multiple text values that are comma-delimited.
  2. Use the most accurately precise semantic XHTML building block for each object etc.
  3. Otherwise use a generic structural element (e.g. <span> or <div>), or the appropriate contextual element (e.g. an <li> inside a <ul> or <ol>).
  4. Use class names based on names from the original schema, unless the semantic XHTML building block precisely represents that part of the original schema. If names in the source schema are case-insensitive, then use an all lowercase equivalent. Components names implicit in prose (rather than explicit in the defined schema) should also use lowercase equivalents for ease of use. Spaces in component names become dash '-' characters.
  5. Finally, if the format of the data according to the original schema is too long and/or not human-friendly, use <abbr> instead of a generic structural element, and place the literal data into the 'title' attribute (where abbr expansions go), and the more brief and human readable equivalent into the element itself. Further informative explanation of this use of <abbr>: Human vs. ISO8601 dates problem solved

Format

In General

Schema

Schema elements are based on the Atom nomenclature and follow the microformat pattern of prefixing a unique identifier (in this case, atom) on the outermost container elements -- the Feed or Entry. The parts of this microformat are based on analysis of many weblog, bulletin board and media posts and can be read blog-post-brainstorming#Discovered_Elements. Note the renaming of 'EntryGroup' to 'Feed' to be more consistent with Atom ternminology.

Nomenclature

Concept Atom Identifier hAtom Microformat Usage
Feed atom:feed add class="atomfeed"; OR implicit in the XHTML page
Feed Title atom:title Not defined in the first iteration of this proposal.
Feed Permalink atom:link@rel=alternate Not defined in the first iteration of this proposal.
Entry atom:entry Add class="atomentry"; if practical, also define id="unique-identifier" to the Entry.
Entry Title atom:title Use <h#> in block elements; OR non-preferentially add class="title" in inline elements.
Entry Content atom:content Add class="content" to all appropriate blocks. Multiple Entry Content blocks are logically considered one concatenated atom:content equivalent.
Entry Summary atom:summary Add class="summary" to all appropriate blocks. Multiple Entry Summary blocks are logically considered one concatenated atom:summary equivalent.
Entry Permalink atom:link Add rel="bookmark".
Entry Published atom:published Use <abbr class="published" title="YYYYMMYYThh:mm:ss±ZZ:ZZ">...</abbr>, following the datetime-design-pattern.
Entry Author atom:author Use <address class="author">...</address>. Adding a hCard is highly recommended.

Nesting Rules

Concept Nests In hAtom Opaque Notes
Feed HTML document Yes
Entry Feed Yes
Entry HTML document Yes
Entry Title Entry Yes
Entry Content Entry No
Entry Summary Entry No
Entry Permalink Entry No
Entry Published Entry No
Entry Author Entry No


Notes:

  • "hAtom Opaque" specifies whether a hAtom parser should "look inside" the element for further hAtom content. If there are multiple rules applied to the same element take the AND of the two (i.e. "No" always wins).

Disambiguation

Entry Content

The Primary Rule of hAtom: content is opaque; hAtom markup within Entry Content and Entry Summary elements is always ignored. This rule reigns above all others in this spec.

This is so quoted hAtom elements (from another blog being quoted, for example) will be ignored; to allow 'embedded' hAtom to be potentially delivered within hAtom itself, and to prevent accidental 'leaking' of other microformat information up into the hAtom container.
Feed
Entry Permalink
Entry Title

Rules and Definitions

Feed
  • an XHTML Feed element is identified by class="atomfeed"
  • a Feed element represents the concept of an atom feed
In particular, as a container for Entrys.
  • hAtom documents SHOULD enclose Entrys in a Feed element
If there is no enclosing Feed element, context is assumed from the document itself and its header.
  • hAtom documents MAY have multiple, non-nested Feed elements
This may happen on news pages, or weblogs with "mini-blogs" on the sidebar.
Feed Title

Not defined in the first iteration of this proposal.

Feed Permalink

Not defined in the first iteration of this proposal.

Entry
  • an Entry element is identified by class="atomentry"
  • an Entry element represents the concept of an atom entry
  • a weblog entry MUST be enclosed in a single Entry element
That's what it's for, after all.
  • Entrys MUST NOT be nested
See #Disabmiguation and #Entry_Content for more details.
  • Entrys MUST NOT belong to more than one Feed element
That is, an Entry belongs to 0 or 1 Feeds.
Entry Title
  • an Entry Title element is identified <h#> in block elements OR non-preferentially class="title" in inline elements
  • an Entry Title element represents the concept of an atom entry title
  • an Entry MUST have exactly 0 or 1 Entry Titles
We need to add disambiguation rules, since obviously multiple <h#> items could appear. But logically speaking there can be at most one.
Entry Content
  • an Entry Content element is identified by class="content"
  • an Entry Content element represents the concept of an atom content
  • an Entry MAY have 0 or more Entry Content elements
We recognize this varies from the Atom spec: see the next rule.
  • the "logical Entry Content" of an Entry is the concatenation, in order of appearance, of all the Entry Contents within the Entry
Many weblogs split content into multiple sections with a "Read More" link and javascript tricks. This is also needed in cases where Entry Titles are coded inline and are considered part of the content.
  • the "logical Entry Content" MUST be complete; that is, be the entire content of the Entry
Otherwise it should be marked as Entry Summary.
  • XHTML elements within Entry Content are entirely opaque to this microformat
That is, if hAtom elements are within the Entry Content, ignore them. This allows hAtom to be transported within itself. (!)
Entry Summary
  • an Entry Summary element is identified by class="summary"
  • an Entry Summary element represents the concept of an atom summary
  • an Entry MAY have 0 or more Entry Summary elements
We recognize this varies from the Atom spec: see the next rule.
  • the "logical Entry Summary" of an Entry is the concatenation, in order of appearance, of all the Entry Summarys within the Entry
  • the "logical Entry Summary" may differ in different copies of the Entry
There is the major difference with Entry Content. We can summarize an Entry in different way in different places with no requirement for consistency. There may be issues with this for modelers: if so, take it up in hatom-issues.
Entry Permalink
  • an Entry Permalink element is identified by rel="bookmark"
We recognize that we have broken from Atom terminology at this point. See hatom-issues for discussion.
  • an Entry Permalink element represents the concept of an atom link in an entry
  • Entry Permalinks MUST be absolute URIs
  • Entry Permalinks MUST be the same as the atom:link (or rss:link) used in syndication feeds
The intention of the previous two rules to gently force people to use strings that can be byte compared for equivalence. In general, the canonical URI should be the link used in an Atom entry.
Is there a problem with FeedBurner?
  • an Entry SHOULD have an Entry Permalink
There are circumstances (should as media pages) where this won't happen.
  • if an Entry has multiple Entry Permalinks, they MUST have exactly the same URI
Entry Published

IN PROGRESS OF BEING WRITTEN

  • an XHTML Entry Published element is identified by class="posted"
  • a Entry Published element represents the concept of an atom published
Entry Author

IN PROGRESS OF BEING WRITTEN

  • an Entry Author element is identified by class="xxx"
  • a Entry Author element represents the concept of an atom author

XMDP Profile

Parsing Details

Examples

This section is informative.

Transformation 1

A well behaved weblog.

Original:

<body>
 <div id="wrap">
  <div id="content">
   ...
   <div class="entry">
    <h3 id="post-60">
     <a href="http://www.microformats.org/blog/..." rel="bookmark" title="...">Wiki Attack</a>
    </h3>
    <p>We had a bit of trouble with ...</p>
    <p>We’ve restored the wiki and ...</p>
    <p>If anyone is working to combat said spammers ...</p>

    <h4 class="tags">Technorati Tags:</h4>
    <ul class="tags">
     <li><a href="http://technorati.com/tag/mediawiki" rel="tag">mediawiki</a></li>
     <li><a href="http://technorati.com/tag/microformats" rel="tag">microformats</a></li>
     <li><a href="http://technorati.com/tag/spam" rel="tag">spam</a></li>
    </ul>

    <ul class="post-info">
     <li>
      <a href="http://www.microformats.org/blog/..." rel="bookmark" title="...">October 10th, 2005</a>
     </li>
     <li>
      <address class="vcard"><a class="url fn" href="http://theryanking.com">Ryan King</a></address>
     </li>
     <li>
      <a href="http://www.microformats.org/blog/...">4 Comments</a>
     </li>
    </ul>
   </div>
   
   <div class="entry">
   ....
   </div>
   ...
  </div>
 </div>
</body>

Transformed to hAtom compliant (shown in UPPER CASE for visibility only):

<body>
 <div id="wrap">
  <div id="ATOMFEED content">
   ...
   <div class="ATOMENTRY entry" ID="post-60">
    <h3>
     <a href="http://www.microformats.org/blog/..." rel="bookmark" title="...">Wiki Attack</a>
    </h3>
    <DIV CLASS="CONTENT">
     <p>We had a bit of trouble with ...</p>
     <p>We’ve restored the wiki and ...</p>
     <p>If anyone is working to combat said spammers ...</p>
    </DIV>

    <h4 class="tags">Technorati Tags:</h4>
    <ul class="tags">
     <li><a href="http://technorati.com/tag/mediawiki" rel="tag">mediawiki</a></li>
     <li><a href="http://technorati.com/tag/microformats" rel="tag">microformats</a></li>
     <li><a href="http://technorati.com/tag/spam" rel="tag">spam</a></li>
    </ul>

    <ul class="post-info">
     <li>
      <a href="http://www.microformats.org/blog/..." rel="bookmark" 
        title="..."><ABBR CLASS="PUBLISHED" TITLE="20051010T14:07:00-0700">October 10th, 2005</ABBR></a>
     </li>
     <li>
      <address class="vcard"><a class="url fn" href="http://theryanking.com">Ryan King</a></address>
     </li>
     <li>
      <a href="http://www.microformats.org/blog/...">4 Comments</a>
     </li>
    </ul>
   </div>
   
   <div class="entry entry" ID="post-59">
   ....
   </div>
   ...
  </div>
 </div>
</body>

Changes:

  • Added class="feed" to Feed
  • Added class="entry" to each Entry
  • Moved id="###" from <h3> to Entry
  • Added <div class="content">...</div> around the Entry Content
  • Added <abbr class="PUBLISHED" title="YYYYMMDDThh:mm:ss+ZZZZ">...</abbr> around the Entry Datetime

Also note:

  • We did not need to add a <address> element
  • We did not need to add a <h#> element
  • We did not need to add a rel="link" to Entry Permalinks

Transformation 2

A not-so well behaved weblog (an older blogspot weblog)

Original:

<body bgcolor="...">

 <div class="posts">
  <a name="112993192128302715"> </a><br>
  <div style="clear:both;"></div><strong>Nelson's final prayer</strong> 
  written on the night before Trafalgar:<blockquote>May the Great God, ... heart.
  <div style="clear:both; padding-bottom: 0.25em;"></div>
  <br>
  <span class="byline">
   posted by Natalie at 
   <a href="2005_10_16_nataliesolent_archive.html#112993192128302715">9:49 PM</a>
  </span>
 </div>

 <div class="posts">
  <a name="112993022840118939"> </a>
  <br>
  <div style="clear:both;"></div><strong>I really, truly </strong>didn't go ... view.
  <div style="clear:both; padding-bottom: 0.25em;"></div>
  <br>
  <span class="byline">
   posted by Natalie at 
   <a href="2005_10_16_nataliesolent_archive.html#112993022840118939">9:28 PM</a>
  </span>
 </div>
 ...

</body>

Transformed to hAtom compliant (shown in UPPER CASE for visibility only):

<body bgcolor="...">

 <DIV CLASS="ATOMFEED">
  <div class="ATOMENTRY posts" ID="112993192128302715">
   <strong CLASS="TITLE CONTENT">
    Nelson's final prayer
   </strong> 
   <SPAN CLASS="CONTENT">
    written on the night before Trafalgar:<blockquote>May the Great God, ... heart.
   </SPAN>
   <DIV>
    <span class="byline">posted by <address>Natalie</address> at 
     <a REL="LINK" href="HTTP://NATALIESOLENT.BLOGSPOT.COM/2005_10_16_nataliesolent_archive.html#112993192128302715">
     <ABBR CLASS="POSTED" TITLE="20051024T094900-0000">9:49 PM</ABBR></a>
    </span>
   </DIV>
  </div>

  <div class="entry posts" ID="112993022840118939">
   <strong CLASS="TITLE CONTENT">I really, truly </strong>
   <SPAN CLASS="CONTENT">
    didn't go ... view.
   </SPAN>
   <DIV>
    <span class="byline">
     posted by <address>Natalie</address> at 
     <a REL="LINK" href="HTTP://NATALIESOLENT.BLOGSPOT.COM/2005_10_16_nataliesolent_archive.html#112993022840118939">
     <ABBR CLASS="POSTED" TITLE="20051024T094900-0000">9:28 PM</ABBR></a>
    </span>
   </DIV>
  </div>
 ...
 </DIV>

</body>

Changes:

  • Added class="feed" to Feed
  • Added class="entry" to each Entry
  • Moved id="###" up to the Entry (and deleted the empty anchor block)
  • Added rel="link" to the Entry Permalinks
  • Made the Entry Permalink non-relative
  • Added <span class="title">...</div> around the Entry Title
  • Added <span class="content">...</div> around the Entry Title (!)
  • Added <span class="content">...</div> around the Entry Content
  • Added <abbr class="posted" title="YYYYMMDDThh:mm:ss+ZZZZ">...</abbr> around the Entry Datetime
  • Added <address> to the poster's name

Also note:

  • there are multiple content blocks, because Natalie Solent embeds the title in the content
  • cleaned up lots of crap HTML presentation stuff, with the assumption it would be fixed in the stylesheet
  • this is one of the uglier transformations you're likely to see

Transformation 3

A media page.

Original:


Transformed to hAtom compliant:


Changes:

Transformation 4

A bulletin board (PunBB)

Original:

<body>
 <div id="punwrap">
  <div id="punviewtopic" class="pun">

   <div id="brdheader" class="block">
    ... header stuff ...
   </div>

   <div id="announce" class="block">
    ... announcement stuff ...
   </div>

   <div class="linkst">
    ... controls for the blog
   </div>

   <div id="p54390" class="blockpost rowodd firstpost">
    <h2>
     <span><span class="conr">#1 </span>
     <a href="viewtopic.php?pid=54390#p54390">2005-10-16 10:36:24</a></span>
    </h2>
    <div class="box">
     <div class="inbox">
      <div class="postleft">
       <dl>
        <dt><strong><a href="profile.php?id=2">Rickard</a></strong></dt>

        <dd class="usertitle"><strong>PunBB Developer</strong></dd>
        <dd class="postavatar"><img src="img/avatars/2.png" width="60" height="60" alt="" /></dd>
        <dd>From: 127.0.0.1</dd>
        <dd>Registered: 2001-11-02</dd>
        <dd>Posts: 7806</dd>
        <dd class="usercontacts"><a href="http://www.punbb.org/">Website</a></dd>

       </dl>
      </div>
      <div class="postright">
       <h3>PunBB 1.2.9</h3>
       <div class="postmsg">
        <p>Just a quick note this time....</p>

       </div>
       <div class="postsignature"><hr />"Programming is like sex: ...</div>
      </div>
      <div class="clearer"></div>
      <div class="postfootleft"><p>Offline</p></div>
      <div class="postfootright"><div> </div></div>
     </div>
    </div>

   </div>

   <div id="p54392" class="blockpost roweven">
    <h2><span><span class="conr">#2 </span><a href="viewtopic.php?pid=54392#p54392">2005-10-16 10:54:41</a></span></h2>
    <div class="box">
     <div class="inbox">
      <div class="postleft">
       <dl>
        <dt><strong><a href="profile.php?id=5298">IdleFire</a></strong></dt>

        <dd class="usertitle"><strong>Member</strong></dd>
        <dd class="postavatar"></dd>
        <dd>Registered: 2005-10-14</dd>
        <dd>Posts: 27</dd>
       </dl>
      </div>
      <div class="postright">

       <h3> Re: PunBB 1.2.9</h3>
       <div class="postmsg">
        <p>...</p>
       </div>
      </div>
      <div class="clearer"></div>
      <div class="postfootleft"><p>Offline</p></div>

      <div class="postfootright"><div> </div></div>
     </div>
    </div>
   </div>
   
   ... more entries ...

   <div id="brdfooter" class="block">
    ... footer stuff ...
   </div>

  </div>
 </div>
</body>

Transformed to hAtom compliant (changes shown in UPPER CASE for visibility only):

<body>
 <div id="punwrap">
  <div id="punviewtopic" class="pun">

   <div id="brdheader" class="block">
    ... header stuff ...
   </div>

   <div id="announce" class="block">
    ... announcement stuff ...
   </div>

   <div class="linkst">
    ... controls for the blog
   </div>

   <div id="p54390" class="ATOMENTRY blockpost rowodd firstpost">
    <h2>
     <span><span class="conr">#1 </span>
     <a REL="BOOKMARK" href="HTTP://FORUMS.PUNBB.ORG/viewtopic.php?pid=54390#p54390">
      <ABBR CLASS="POSTED" TITLE="20051016T103624-0500">2005-10-16 10:36:24</ABBR>
     </a></span>
    </h2>
    <div class="box">
     <div class="inbox">
      <div class="postleft">
       <dl>
        <dt><strong><ADDRESS><a href="profile.php?id=2">Rickard</a></ADDRESS></strong></dt>

        <dd class="usertitle"><strong>PunBB Developer</strong></dd>
        <dd class="postavatar"><img src="img/avatars/2.png" width="60" height="60" alt="" /></dd>
        <dd>From: 127.0.0.1</dd>
        <dd>Registered: 2001-11-02</dd>
        <dd>Posts: 7806</dd>
        <dd class="usercontacts"><a href="http://www.punbb.org/">Website</a></dd>

       </dl>
      </div>
      <div class="postright">
       <h3>PunBB 1.2.9</h3>
       <div class="CONTENT postmsg">
        <p>Just a quick note this time....</p>

       </div>
       <div class="postsignature"><hr />"Programming is like sex: ...</div>
      </div>
      <div class="clearer"></div>
      <div class="postfootleft"><p>Offline</p></div>
      <div class="postfootright"><div> </div></div>
     </div>
    </div>

   </div>

   <div id="p54392" class="ATOMENTRY blockpost roweven">
    <h2>
     <span><span class="conr">#2 </span>
     <a REL="BOOKMARK" href="HTTP://FORUMS.PUNBB.ORG/viewtopic.php?pid=54392#p54392">
      <ABBR CLASS="POSTED" TITLE="20051016T1105441-0500">2005-10-16 10:54:41</ABBR>
     </a></span>
    </h2>
    <div class="box">
     <div class="inbox">
      <div class="postleft">
       <dl>
        <dt><strong><ADDRESS><a href="profile.php?id=5298">IdleFire</a></ADDRESS></strong></dt>

        <dd class="usertitle"><strong>Member</strong></dd>
        <dd class="postavatar"></dd>
        <dd>Registered: 2005-10-14</dd>
        <dd>Posts: 27</dd>
       </dl>
      </div>
      <div class="postright">

       <h3> Re: PunBB 1.2.9</h3>
       <div class="CONTENT postmsg">
        <p>...</p>
       </div>
      </div>
      <div class="clearer"></div>
      <div class="postfootleft"><p>Offline</p></div>

      <div class="postfootright"><div> </div></div>
     </div>
    </div>
   </div>
   
   ... more entries ...

   <div id="brdfooter" class="block">
    ... footer stuff ...
   </div>

  </div>
 </div>
</body>

Changes:

Notes:

Questions:

  • should the address enclose the entire author block?

More Examples

See hatom-examples.

Examples in the wild

This section is informative.

Implementations

This section is informative.

References

Normative References

Informative References

Specifications That Use hAtom

Similar Work

Work in progress

This specification is a work in progress. As additional aspects are discussed, understood, and written, they will be added. There is a separate document where we are keeping our brainstorms and other explorations relating to hAtom:

Discussions

Q&A

  • If you have any questions about hAtom, check the hAtom FAQ, and if you don't find answers, add your questions!

Issues

  • Please add any issues with the specification to the separate hAtom issues document.

See Also