news-brainstorming

(Difference between revisions)

Jump to: navigation, search
(See also hatom)
(Open Issues: add comment about must vs. should vs. may)
Line 60: Line 60:
** +1 I agree that the "principles" property (and probably all other others) should be optional. [[User:Tantek|Tantek]] 18:29, 29 September 2009 (UTC)
** +1 I agree that the "principles" property (and probably all other others) should be optional. [[User:Tantek|Tantek]] 18:29, 29 September 2009 (UTC)
*** I think it's important to explain why principles is a requirement. hnews is essentially a specialization of hAtom. Its purpose is to distinguish news on the web. Hence the description of source organisation, license and principles. Of these, principles is the only one which consistently distinguishes news on the web from other content (eg. commercial, government). In the future it should be distinguished further by making the principles themselves machine readable (but that is for a later date). Most professional news organisations adhere to a Statement of Principles (e.g. see http://en.wikipedia.org/wiki/Journalism_ethics_and_standards and http://www.journalism.org/resources/ethics_codes). If a site wants to mark up its content but does not want to distinguish it as news, then wouldn't it be easiest to use hAtom? [[User:martinjemoore|Martin Moore]] 9:00, 20 September 2009 (UTC)
*** I think it's important to explain why principles is a requirement. hnews is essentially a specialization of hAtom. Its purpose is to distinguish news on the web. Hence the description of source organisation, license and principles. Of these, principles is the only one which consistently distinguishes news on the web from other content (eg. commercial, government). In the future it should be distinguished further by making the principles themselves machine readable (but that is for a later date). Most professional news organisations adhere to a Statement of Principles (e.g. see http://en.wikipedia.org/wiki/Journalism_ethics_and_standards and http://www.journalism.org/resources/ethics_codes). If a site wants to mark up its content but does not want to distinguish it as news, then wouldn't it be easiest to use hAtom? [[User:martinjemoore|Martin Moore]] 9:00, 20 September 2009 (UTC)
 +
*** Having discussed this issue at length outside this brainstorming, we understand some of the concerns of the microformat community regarding 'must', but are still convinced of the criticality of principles to hNews - therefore recommend downgrading from 'must' to 'should'. [[User:martinjemoore|Martin Moore]] 14:00, 7 October 2009 (UTC)
</div>
</div>
</div>
</div>

Revision as of 14:29, 7 October 2009

News Brainstorming

There have been several efforts to define data formats for news content. Almost all have focused on the interchange of news content between systems and organizations, and so contain dozens (if not hundreds) of fields that are targeted at "news management"--a mix of content management, metadata management, versioning and other operations undertaken by news organizations.

This page serves to document the brainstorming and ideas resulting from analysis of news examples from real world sites and systems for the design of a simple news microformat. - Jonathan Malek

Contents


Contributors

See Also

The Problem

While there are dozens of formats used on thousands of news sites, there is no single standardized format for presentation of news on the web. Having a standardized news format for web publishing would significantly benefit readers, aggregators, search engines and researchers alike. With no standard format for news, search engines are forced to parse unstructured data, and errors can be costly (see Wired.com, 2008).

Thoughts on a Microformat for News

We found significant overlap with hAtom, and simplified an initial effort at a data format for news away from describing any fields already in hAtom, or the superset Atom, with the expectation that future versions of that draft specification would approach feature parity. Instead, we focused on those news fields not in hAtom.

In much the same way that one extends Atom, we are looking to extend hAtom with the most vital news-specific fields.

The fields we've selected are a combination of the common fields from many of the news formats currently in use, and the introduction of one new field, principles.

Common News Fields

Issues

Please add new issues to the bottom of the Open Issues section by copy and pasting the Template. Please follow-up to resolved/rejected issues with new information rather than resubmitting such issues. Duplicate issue additions will be reverted.

See also hAtom Issues

Issues Template

Consider using this format (copy and paste this to the end of the list to add your issues; replace ~~~ with an external link if preferred) to report issues or feedback, so that issues can show up in hAtom subscriptions of this issues page. If open issues lack this markup, please add it.

Please post one issue per entry, to make them easier to manage. Avoid combining multiple issues into single reports, as this can confuse or muddle feedback, and puts a burden of separating the discrete issues onto someone else who 1. may not have the time, and 2. may not understand the issue in the same way as the original reporter.

<div class="hentry">
{{OpenIssue}} 
<span class="entry-summary author vcard">
 <span class="published">2011-MM-DD</span> 
 raised by <span class="fn">~~~</span>
</span>
<div class="entry-content discussion issues">
* <strong class="entry-title">«Short title of issue»</strong>. «Description of Issue»
** Follow-up comment #1
** Follow-up comment #2
</div>
</div>

Open Issues

open issue! 2009-09-28 raised by Miles De Feyter

  • Principles as a requirement. Working for a publishing company that owns and operates a large number of different organizations I'd love to incorporate hNews within our publishing system. The hNews requirement for a principles statement could pose a problem though or at least make rolling out hNews a more involved process then it would be otherwise. The issue is, I would now have to go to each product owner and ask then to provide this principles statement to link to. So my concern is now rather then just making a change to the publishing system to support hNews there is this requirement for some supporting content. And due to the nature of the content I can only assume our legal dep. would need to sign off as well, further complicating the adoption of hNews.
    • +1 I agree that the "principles" property (and probably all other others) should be optional. Tantek 18:29, 29 September 2009 (UTC)
      • I think it's important to explain why principles is a requirement. hnews is essentially a specialization of hAtom. Its purpose is to distinguish news on the web. Hence the description of source organisation, license and principles. Of these, principles is the only one which consistently distinguishes news on the web from other content (eg. commercial, government). In the future it should be distinguished further by making the principles themselves machine readable (but that is for a later date). Most professional news organisations adhere to a Statement of Principles (e.g. see http://en.wikipedia.org/wiki/Journalism_ethics_and_standards and http://www.journalism.org/resources/ethics_codes). If a site wants to mark up its content but does not want to distinguish it as news, then wouldn't it be easiest to use hAtom? Martin Moore 9:00, 20 September 2009 (UTC)
      • Having discussed this issue at length outside this brainstorming, we understand some of the concerns of the microformat community regarding 'must', but are still convinced of the criticality of principles to hNews - therefore recommend downgrading from 'must' to 'should'. Martin Moore 14:00, 7 October 2009 (UTC)


closed issue 18:32, 24 August 2009 (UTC) raised by Kevin Marks

  • hCalendar instead of dateline? Would an hCalendar event (which can contain an hCard location) make sense for a dateline, or is the 'date' part more often omitted?
    • Confusingly, the journalistic term "dateline" isn't anything to do with a date or time. It is the location from which a report is filed and is generally the main location associated with a story. Generally, a dateline consists of a city (e.g. "Rome") but could be the name of a ship at sea or even a space station. Stuart Myles 21:12, 24 August 2009 (UTC)

closed issue 18:32, 24 August 2009 (UTC) raised by Kevin Marks

  • hCard instead of geo? Is geo really in use here, or would using an hCard (that can contain geo) be a better way of representing locations referred to in the story, as more human readable?
    • The reason for geo being highlighted (as an optional field) is to promote at least one location identifier in the story--preferably the most appropriate single location on a map for that particular story. Geo does not have to be related to dateline, but in some examples we've worked on, we show the two collapsed into a single field. --JonathanMalek 23:53, 24 August 2009 (UTC)
    • For locations referred to in the story, I agree--publishers should be using hCard with the contained geo to markup the locations themselves. One of the concepts I've struggled with is drawing an admittedly arbitrary line between the metadata about a story from the metadata within a story. For the former, we've focused on simplicity and minimalism, primarily as a means to encourage adoption. That has meant preferring rel-tag over in-line entity extraction and markup using compound microformats. For the latter, we feel that the field is open: use whatever microformat fits your purpose, however you can--the more, the better. This lets publishers with minimal technology capabilities at least get started by tweaking a few templates in their CMS, while those more technically inclined aren't limited by the simplicity of the format to a paucity of data. --JonathanMalek 23:53, 24 August 2009 (UTC)
    • Also, dateline can be text or hCard, as noted in the Common News Fields section. --JonathanMalek 18:17, 24 September 2009 (UTC)

closed issue 18:32, 24 August 2009 (UTC) raised by Kevin Marks

  • What is item-license? Using rel-license presumably?
    • We're working off the licensing-brainstorming discussions for this. Our concern with rel-license was its definition as applying to an entire page, rather than an item within a page. The current discussions around licensing definitely address that. --JonathanMalek 00:02, 25 August 2009 (UTC)
      • +1 using item-license for news-brainstorming makes sense. Tantek 22:32, 27 August 2009 (UTC)

Closed Issues

Naming

Here are candidate names for a news microformat:

Proposal

hNews is a data format (similar to a microformat) for news content. hNews extends hAtom, introducing a number of fields that more completely describe a journalistic work. hNews also introduces another data format, rel-principles, a format that describes the journalistic principles upheld by the journalist or news organization that has published the news item. hNews will be one of several open standards.

Introduction

hNews is a format (similar to a microformat) for identifying semantic information in news stories. It builds on hAtom, while adding a number of fields that more completely define a journalistic work. hNews can be thought of as inheriting from hAtom, since parsers and tools that do not understand the hNews extensions can still parse the hAtom content. However, those parsers and applications that understand hNews can enable a richer set of semantic actions on news stories.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Format

In General

hNews extends hAtom. As the hAtom draft format notes, "Atom provides a lot more functionality than we need for a 'blog post' microformat, so we've taken the minimal number of elements needed." News stories typically introduce more fields (for instance, the publishing organization) than the current 0.1 draft of hAtom, and those fields are very important when reading or evaluating a news story. We focus on those fields that enable the development of semantic actions around news: license, principles, dateline (geo) and source organization.

Schema

The hNews schema consists of the following:

[*] Some required elements have defaults if missing, see below.

Field and Element Details

Source Organization
Dateline
Geo
License
Principles

XMDP Profile

<dl class="profile">
 <dt>class</dt>
 <dd><p>

  <a rel="help" href="http://www.w3.org/TR/html401/struct/global.html#adef-class">
   HTML4 definition of the 'class' attribute.</a>
  This meta data profile defines some 'class' attribute values (class names) 
  and their meanings as suggested by a 
  <a href="http://www.w3.org/TR/WD-htmllink-970328#profile">
   draft of "Hypertext Links in HTML"</a>.
  <dl>

   <dt>root</dt>
   <dd>
    Used to describe semantic information associated with news stories.
   </dd>

   <dt>source-org</dt>
   <dd>
    The originating organization for the news story.
   </dd>

   <dt>dateline</dt>
   <dd>
    Represents the location where the news story was filed.
   </dd>

   <dt>geo</dt>
   <dd>
    Represents geographic coordinates of relevant locations in the story.
   </dd>

   <dt>item-license</dt>
   <dd>
    Represents the license for the story.
   </dd>

   <dt>principles</dt>
   <dd>
    Represents the statement of principles and ethics used by the news organization that produced the news story.
   </dd>

  </dl>
 </dd>
</dl>
news-brainstorming was last modified: Wednesday, December 31st, 1969

Views