news-brainstorming: Difference between revisions

From Microformats Wiki
Jump to navigation Jump to search
(Clarifying the hAtom dependency in a news format. A news-formatted story MUST always be parseable as just hAtom.)
(Moved issues to issues page)
 
(20 intermediate revisions by 5 users not shown)
Line 44: Line 44:




Please add new issues to the bottom of the [[news-brainstorming#Open_Issues|Open Issues]] section by copy and pasting the [[news-brainstorming#Issues_Template|Template]]. Please follow-up to resolved/rejected issues with new information rather than resubmitting such issues. Duplicate issue additions will be reverted.
Please see [[hnews-issues|hNews issues]].


=== Issues Template ===
See also [[hatom-issues|hAtom Issues]] for questions about [[hAtom]].


{{issues-format}}
=== Open Issues ===
<div class="hentry">
{{OpenIssue}} <span class="entry-summary author vcard"><span class="published">18:32, 24 August 2009 (UTC)</span> raised by <span class="fn">[[User:Kevin Marks|Kevin Marks]]</span></span>
<div class="entry-content discussion issues">
* <strong class="entry-title">hCalendar instead of dateline?</strong> Would an [[hCalendar]] event (which can contain an hCard location) make sense for a dateline, or is the 'date' part more often omitted?
** Confusingly, the journalistic term "dateline" isn't anything to do with a date or time.  It is the location from which a report is filed and is generally the main location associated with a story.  Generally, a dateline consists of a city (e.g. "Rome") but could be the name of a ship at sea or even a space station. [[User:Stuart Myles|Stuart Myles]] 21:12, 24 August 2009 (UTC)
</div>
</div>
<div class="hentry">
{{OpenIssue}} <span class="entry-summary author vcard"><span class="published">18:32, 24 August 2009 (UTC)</span> raised by <span class="fn">[[User:Kevin Marks|Kevin Marks]]</span></span>
<div class="entry-content discussion issues">
* <strong class="entry-title">hCard instead of geo?</strong>  Is geo really in use here, or would using an hCard (that can contain geo) be a better way of representing locations referred to in the story, as more human readable?
** The reason for geo being highlighted (as an optional field) is to promote at least one location identifier in the story--preferably the most appropriate single location on a map for that particular story.  Geo does not have to be related to dateline, but in some [http://labs.ap.org/wiki/hNews examples] we've worked on, we show the two collapsed into a single field. --[[User:JonathanMalek|JonathanMalek]] 23:53, 24 August 2009 (UTC)
** For locations referred to in the story, I agree--publishers should be using [[hCard]] with the contained geo to markup the locations themselves.  One of the concepts I've struggled with is drawing an admittedly arbitrary line between the metadata ''about'' a story from the metadata ''within'' a story.  For the former, we've focused on simplicity and minimalism, primarily as a means to encourage adoption.  That has meant preferring [[rel-tag]] over in-line entity extraction and markup using compound microformats.  For the latter, we feel that the field is open: use whatever microformat fits your purpose, however you can--the more, the better.  This lets publishers with minimal technology capabilities at least get started by tweaking a few templates in their CMS, while those more technically inclined aren't limited by the simplicity of the format to a paucity of data. --[[User:JonathanMalek|JonathanMalek]] 23:53, 24 August 2009 (UTC)
</div>
</div>
<div class="hentry">
{{OpenIssue}} <span class="entry-summary author vcard"><span class="published">18:32, 24 August 2009 (UTC)</span> raised by <span class="fn">[[User:Kevin Marks|Kevin Marks]]</span></span>
<div class="entry-content discussion issues">
* <strong class="entry-title">What is item-license?</strong>  Using [[rel-license]] presumably?
** We're working off the [[licensing-brainstorming#item_as_container|licensing-brainstorming]] discussions for this.  Our concern with [[rel-license]] was its definition as applying to an entire page, rather than an item within a page.  The current discussions around licensing definitely address that. --[[User:JonathanMalek|JonathanMalek]] 00:02, 25 August 2009 (UTC)
*** +1 using item-license for news-brainstorming makes sense. [[User:Tantek|Tantek]] 22:32, 27 August 2009 (UTC)
</div>
</div>
=== Closed Issues ===


== Naming ==
== Naming ==
Line 87: Line 56:


= Proposal =  
= Proposal =  
hNews is a data format (similar to a microformat) for news content. hNews extends [http://microformats.org/wiki/hatom hAtom], introducing a number of fields that more completely describe a journalistic work. hNews also introduces another data format, [http://newscredit.org/development/newscredit-specification/rel-principles-specification/ rel-principles], a format that describes the journalistic principles upheld by the journalist or news organization that has published the news item. hNews will be one of several open standards.
See the [[hnews]] draft.
 
== Introduction ==
hNews is a format (similar to a [[microformat]]) for identifying semantic information in news stories. It builds on [[hAtom]], while adding a number of fields that more completely define a journalistic work. hNews can be thought of as inheriting from [[hAtom]], since parsers and tools that do not understand the hNews extensions can still parse the [[hAtom]] content. However, those parsers and applications that understand hNews can enable a richer set of semantic actions on news stories.
 
{{rfc-2119-intro}}
 
== Format ==
=== In General ===
hNews extends hAtom. As the hAtom draft format notes, "Atom provides a lot more functionality than we need for a 'blog post' microformat, so we've taken the minimal number of elements needed." News stories typically introduce more fields (for instance, the publishing organization) than the current 0.1 draft of hAtom, and those fields are very important when reading or evaluating a news story. We focus on those fields that enable the development of semantic actions around news: license, principles, dateline (geo) and source organization.
 
=== Schema ===
The hNews schema consists of the following:
 
* (root) ('''<code>root</code>''') and '''<code>hentry</code>'''.  required. Using [[hAtom]].
** '''<code>source-org</code>'''. required. Using [[hCard]].[*]
** '''<code>dateline</code>'''. optional. Using text or [[hCard]].
** '''<code>geo</code>'''. optional. Using [[geo]].[*]
** '''<code>item-license</code>'''. required. Using a [http://microformats.org/wiki/licensing-brainstorming#item_as_container license brainstorm proposal].
** '''<code>principles</code>'''. required. Using the draft microformat. [http://newscredit.org/development/newscredit-specification/rel-principles-specification/ rel-principles].
 
[*] Some required elements have defaults if missing, see below.
 
=== Field and Element Details ===
 
===== Source Organization =====
* a Source Organization element is identified by the class name <code>source-org</code>.
* Source Organization represents the originating organization for the news story.
* a Source Organization {{must}} be encoded in an [[hCard]].
* if the Source Organization is missing
** find the [[algorithm-nearest-in-parent]] element(s) with class name <code>source-org</code> and that is/are a valid [[hCard]]
** otherwise the entry is invalid hNews
 
===== Dateline =====
* a dateline element is identified by the class name <code>dateline</code>.
* dateline represents the location where the news story was written or filed (see [http://en.wikipedia.org/wiki/Dateline dateline] for more details).
* a dateline element {{may}} be encoded in an [[hCard]].
* a news story {{should}} have a dateline element.
* dateline sometimes also includes the publish date of the news story. In such cases, use the [[datetime-design-pattern]] to encode the date.
 
===== Geo =====
* a geo element is identified by the class name <code>geo</code>
* geo represents the geographic coordinates of relevant locations in the news story.
* a geo element should be encoded in a [[geo]].
* in those cases where the latitude and longitude represent the dateline, a variant of [[geo]] should be used (see [http://microformats.org/wiki/geo-brainstorming#Geo_improvements Geo Improvements] for an example).
 
===== License =====
* a license element is identified by the class name <code>item-license</code>.
* a license element {{must}} be encoded as described in this [http://microformats.org/wiki/licensing-brainstorming#item_as_container license brainstorm proposal].
 
===== Principles =====
* a principles element is identified by <code>rel-principles</code>.
* principles represents the statement of principles and ethics used by the news organization that produced the news story.
* a principles element {{must}} be encoded in [http://newscredit.org/development/newscredit-specification/rel-principles-specification/ rel-principles].
* principles {{should}} be linked to using the icons [[image:principles-button-blue.png]] or [[image:principles-book-blue.png]].
 
=== XMDP Profile ===
<pre>
<dl class="profile">
<dt>class</dt>
<dd><p>
 
  <a rel="help" href="http://www.w3.org/TR/html401/struct/global.html#adef-class">
  HTML4 definition of the 'class' attribute.</a>
  This meta data profile defines some 'class' attribute values (class names)
  and their meanings as suggested by a
  <a href="http://www.w3.org/TR/WD-htmllink-970328#profile">
  draft of "Hypertext Links in HTML"</a>.
  <dl>
 
  <dt>root</dt>
  <dd>
    Used to describe semantic information associated with news stories.
  </dd>
 
  <dt>source-org</dt>
  <dd>
    The originating organization for the news story.
  </dd>
 
  <dt>dateline</dt>
  <dd>
    Represents the location where the news story was filed.
  </dd>
 
  <dt>geo</dt>
  <dd>
    Represents geographic coordinates of relevant locations in the story.
  </dd>
 
  <dt>item-license</dt>
  <dd>
    Represents the license for the story.
  </dd>
 
  <dt>principles</dt>
  <dd>
    Represents the statement of principles and ethics used by the news organization that produced the news story.
  </dd>
 
  </dl>
</dd>
</dl>
</pre>

Latest revision as of 23:02, 14 October 2009

News Brainstorming

There have been several efforts to define data formats for news content. Almost all have focused on the interchange of news content between systems and organizations, and so contain dozens (if not hundreds) of fields that are targeted at "news management"--a mix of content management, metadata management, versioning and other operations undertaken by news organizations.

This page serves to document the brainstorming and ideas resulting from analysis of news examples from real world sites and systems for the design of a simple news microformat. - Jonathan Malek

Contributors

  • Jonathan Malek
  • Stuart Myles
  • Martin Moore
  • Mark Ng
  • Todd Martin

See Also

The Problem

While there are dozens of formats used on thousands of news sites, there is no single standardized format for presentation of news on the web. Having a standardized news format for web publishing would significantly benefit readers, aggregators, search engines and researchers alike. With no standard format for news, search engines are forced to parse unstructured data, and errors can be costly (see Wired.com, 2008).

Thoughts on a Microformat for News

We found significant overlap with hAtom, and simplified an initial effort at a data format for news away from describing any fields already in hAtom, or the superset Atom, with the expectation that future versions of that draft specification would approach feature parity. Instead, we focused on those news fields not in hAtom.

In much the same way that one extends Atom, we are looking to extend hAtom with the most vital news-specific fields.

The fields we've selected are a combination of the common fields from many of the news formats currently in use, and the introduction of one new field, principles.

Common News Fields

  • hAtom fields: first and foremost, a news story is an hentry. If the news story cannot be parsed by an hAtom parser, it is not a valid news format.
  • source-org: the source organization for this particular news story--should be considered different from the atom:source element because it does not represent the source feed, but rather the source organization (and so should use hCard). We're using source-org to avoid name conflict with hAtom should the draft decide to include the atom:source element.
  • dateline: using text or hCard, not to be confused with date (see dateline for more information).
  • geo: using geo, a simple way of providing the information necessary for services for readers around local news content. This field should be inherited from hAtom, but since it is not part of the format yet, we're including it here. See the hAtom and Geo discussion.
  • item-license: to express licensing around the item
  • principles: using the draft format rel-principles

Issues

Please see hNews issues.

See also hAtom Issues for questions about hAtom.


Naming

Here are candidate names for a news microformat:

  • hNews

Proposal

See the hnews draft.