The 'blog-post-microformat' proposes a codification of how blog posts are indentifies within weblogs. It is hoped that this should be considered to be 'expansive': for example, the proposal could be used on CNN.com to mark up news articles and summary pages.
This section explores the terminology that should used to discuss a blog post microformat. To make it easier to talk about the various different types of teminology, We're using a XML-like namespace version so we can make statements like
atom:entry is roughly equivalent to
atom:feed/atom:link@rel=alternate is roughly equivalent to
atom:author is not equivalent to
rss:entry/rss:author (because RSS 2.0 is only the definition of an email address).
Common terminology in weblogs
Reviewing Current Blog Formats, one can see that there's little standardization amongst tools or even within a individual tool (such as 'blogger') for names of elements of blog posts. There are however many common elements, including:
- a container for all posts/entries
- a container for individual posts
- the name of the author
- the posting date (in many many formats)
Although this looks like a bit of a dog's breakfast, there is usually a fair amount of rigour behind the presentation, as Atom and/or RSS feeds can be produced also from the same tools.
Furthermore, in developing a microformat for weblog posts, we want to be careful not to break any (or many) templates. Note that many weblog templates will have to be updated as they produce somewhat crufty HTML rather than shiny XHTML.
atom:feed- (composite) a collection of entries plus information about them
atom:author- (composite) the author of a feed (may contain atom:email, atom:name, atom:uri)
atom:id- a permament identifier for a feed
atom:title- the title of an atom:entry or a atom:feed
atom:updated- the last time the feed was updated
atom:link@rel=alternate- the home page of a feed
atom:link@rel=self- the URI of the feed (where it can be downloaded)
atom:entry- (composite) an entry within the feed
atom:content- the feed's content
atom:summary- a summary of the feed's content
rss2:channel- (composite) a collection of entries plus information about them
rss2:author- (composite) the author of a feed (may contain atom:email, atom:name, atom:uri)
rss2:link- The URL to the HTML website corresponding to the channel (compare to atom:link@rel=alternate)
rss2:title- the title of an rss2:channel or a rss2:item
rss2:pubDate- The publication date for the content in the channel.
rss2:item- (composite) an entry within the feed
rss2:item/link- The URL of the item. Note that this may not be a permalink for the item; it may be a link to some other page on the Internet that the rss2:item is about
rss2:description- The item synopsis [sic]. There is no special indication whether this is the full content of an entry, a summary, or a precis of what the rss2:item/link is pointing to
rss2:author- email address of the author of the item
Atom has a much more precise mechanism for defining syndication feeds and weblog data. A mechanical transformation from Atom -> RSS will always lead to a correct RSS feed; a RSS -> Atom translation would have to make a decision amongst multiple definitions that may not always be correct. For example, the format of markup, the role of an author, or the meaning of a link.
IMPORTANT: we shall talk about things such as 'marking elements
atom:feed'; consider this a purely conceptual thing. The text 'atom:feed' will not appear in the XHTML microformat -- we may decide later to use the actual phrase 'atom_feed', 'feed', 'items' or 'googlybear'.
This section explores the information discovered from Current Blog Formats using the terminology discussed above. We will only focus on the major elements of weblog posts:
- the entry container
- the individual entry
- the content
- the permalink
For now, the codification of the following major elements will be deferred as there is/may be overlap with other microformats that should be explored further
- the poster/author
- the posting date
- the modified date
Further input from the community would be appreciated here
Roughly speaking, this corresponds to 'atom:feed' or 'rss2:channel' (in particular, the items within those elements).
Forms seen in the wild
- entries are within a container; that is, all entries are within an enclosing 'div'. This is common with weblog home pages (example) or archive with multiple entries.
- entries are not within a container; that is, there are multiple entries on a single page but there is no explicit container element (example). This is also a common use case for weblogs and archives also.
- there may be multiple groups of entries on a single page that are tenously connected (example-1 example-2).
- there is only a single entry on a page. This is common with weblogs that archive on a per entry basis (example).
Recommendation for blog-post-format microformat
- weblog pages (including home pages, archives, category pages, tag pages and so forth) that may container multiple entries MUST enclose the entries in a
- weblog pages MAY have multiple
atom:feedelement enclosing different groups of entries
- weblog pages that have exactly on entry MAY use the
The 'content' problem
The most inconsistent element of blog posts is the content of the post themselves. For example, one webpage may only have a summary of the page, another webpage may contain the first part of the content, with a "More" button to see the rest. These inconsistencies may make it difficult to rationally define (or clarify) a set of microformat elements to achieve blog-post-feed-equivalence.
Header Tag for Entry Title?
--Bryan 14:55, 14 Aug 2005 (PDT)
Many weblog CMSes allow for concurrent publishing of entries in the following ways:
- multiple entries on a page (an "Index," monthly archive, category archive, etc. see Example)
- one entry on a page (see Example)
Early attempts at Current Blog Formats have set the title of the blog post to use the h3 tag.
At least where individual entry pages are concerned (and possible including indexes and archives), I recommend using h1 for the entry title, given that the entry is by far the most important chunk of information on the page, and it's what we'd want search engines to recognize as such. In the case where the h1 was used for the site title, fears about "losing" this information should be allayed by simply including the site name in the title tag, after the title of the article / entry / post.
- Whether an h3 or h1 is used is irrelevant, the semantics will be applied with classnames. This is a non-issue. --RyanKing 22:35, 18 Aug 2005 (PDT)
This section is to describe possible applications for a blog post microformat