Difference between revisions of "xoxo-opml-issues"

From Microformats Wiki
Jump to navigation Jump to search
(added another XSLT reference)
(linking OPML)
 
(8 intermediate revisions by 6 users not shown)
Line 1: Line 1:
I [[User:Brian]] have been working on a web service to convert XOXO lists into OPML. The service can be found at [http://suda.co.uk/projects/microformats/xoxo/ http://suda.co.uk/projects/microformats/xoxo/] another can be found at [http://xoxotools.ning.com/outlineconvert.php http://xoxotools.ning.com/outlineconvert.php] and another here [http://decafbad.com/2005/11/gopher-ng/http://decafbad.com/2005/11/gopher-ng/] for generating OPML to be used with Hyperscope.
+
This page is used to document any issues in XSLT that converts XOXO to [[OPML]], any short comings that need to be addressed in XOXO that are REQUIRED in OPML, and any issues with News Readers when they import OPML files that have been generated from XOXO. The original request was to the [http://microformats.org/discuss/mail/microformats-discuss/2006-September/005835.html mailing list on September 25th]
  
This page is used to document any issues in XSLT that converts XOXO to OPML, any short comings that need to be addressed in XOXO that are REQUIRED in OPML, and any issues with News Readers when they import OPML files that have been generated from XOXO. The original request was to the [http://microformats.org/discuss/mail/microformats-discuss/2006-September/005835.html mailing list on September 25th]
+
== Draft conversion principles ==
 +
OPML specifies limitations in a loose way, using the '''type''' attribute. There is some formal canonicalisation (in the [http://opml.org/spec 1.0 spec] and the [http://opml.org/spec2 2.0 spec]) of what individual type attributes do. Type attribute values extend the standard attribute set of the outline node. So, for instance, the "rss" type value tells the processor to look for feed-specific values.
 +
 
 +
=== MIME type ===
 +
OPML is usually served with a large variety of MIME types including:
 +
* text/html
 +
* application/xml
 +
* text/xml
 +
* text/x-opml
 +
 
 +
There have been suggestions as to whether or not to start serving OPML as:
 +
* application/xml+opml
 +
 
 +
One should not infer that something is or is not OPML based on the MIME type, because that's not reliable. But all OPML-to-XOXO tools ought to point to OPML files (includes, etc.) in a consistent way. That way, JavaScript could be laid over the top of XOXO outlines to allow them to include OPML outlines or link to them in a way that would proxy them back in to XOXO (etc.).
 +
 
 +
=== Text attribute ===
 +
There is some confusion over the difference between the '''text''' and '''title''' attributes. Both are reasonably well-defined by the OPML specification, and serve different purposes. Some implementations of OPML break from the specification in providing a title attribute but not a text attribute. If a text attribute is present but not a title attribute, one should infer that the text attribute is equal to the title attribute. One '''should not''' infer that the title attribute is equivalent to the text attribute (see [http://opml.org/spec2 OPML 2.0 Spec]).
 +
 
 +
The text attribute can and does often contain escaped HTML markup (which is really a bad practice, and has led to a lot of criticism of OPML). This is standard behavior from the OPML Editor. An OPML-to-XOXO parser should ideally take data from the text attribute and put it into a XOXO outline in a standard way.
 +
=== Known type attribute values ===
 +
 
 +
==== Blank ====
 +
A blank type attribute usually implies that it is a text node in an outline, using the '''text''' and '''created''' nodes. This is usually the behavior of most outliners and is the default behavior of the OPML Editor.
 +
 
 +
==== RSS Feed ====
 +
The '''type''' attribute is set to string '''rss''', implies following attributes:
 +
* '''text''' - usually, but not always, the title of the feed - is user-modifiable, so should not be used as feed title by applications
 +
* '''xmlUrl''' - the URL of an XML feed
 +
* '''htmlUrl''' - the URL of the HTML representation of the feed (optional)
 +
** Implementation note - Bloglines emits invalid OPML by sometimes including this attribute but leaving it blank [http://tommorris.org/blog/2007/09/01#When:09:07:29].
 +
* '''language''' - the language of the feed (optional)
 +
* '''version''' - the particular type of XML feed:
 +
** '''RSS''' - The [http://opml.org/spec2 spec] lists this as being used for RSS 0.9x and 2.0 feeds
 +
*** MIME type: application/rss+xml
 +
** '''RSS1''' - The [http://opml.org/spec2 spec] lists this as being used for RSS 1.0 (RDF) feeds
 +
*** MIME type: application/rdf+xml
 +
*** Despite RSS 1.0 being "just RDF", plenty of people expect it to be RDF/XML, so no Turtle or JSON or embedded RDF-in-HTML.
 +
** '''scriptingNews''' - The [http://opml.org/spec2 spec] lists this as being used for Scripting News format feeds
 +
*** This is an edge case. I'm outputting application/rss+xml as MIME type unless anyone has any good argument to the contrary. This is because it should trigger most people's "RSS" mode in their browser, and then their RSS reader should be able to unpick it using whatever parsing magic is contained within.
 +
*** An example scriptingNews feed [http://essaysfromexodus.scripting.com/xml/scriptingnews2.xml], and a [http://www.manton.org/2003/03/scriptingnews_format.html Python parser].
 +
** '''atom''' - In unofficial usage, this is used for Atom feeds of all types
 +
*** MIME type: application/atom+xml
 +
*** [http://www.opml.org/stories/storyReader$5199#randyMorinsComments Dave Winer]: "I don't know what the valid values are for the Atom version attribute. If someone who is an expert on Atom would provide them, and show me that there's some agreement about this from Atom experts, I would be happy to say something about this in the OPML 2.0 spec."
 +
** '''RSS2''' - In unofficial usage, this is used to represent RSS 2.0 feeds only (although, according to the spec, they ought to use '''RSS'''.
 +
* '''description''' - The description field from the linked feed (optional)
 +
* '''title''' - The title of the linked feed (eg. "Engadget") (optional)
 +
** Best behavior with title is to grab the RSS/Atom feed and infer it from that, rather than relying on the OPML file to give it to you.
 +
 
 +
What should be done with an RSS feed node? Since it is almost the primary use of OPML, it would probably be advisable to optimize any conversion effort to deal efficiently with feeds.
 +
 
 +
The '''text''' attribute may list something different from the title of the linked feed, so that ought to be the value of the hyperlink - one may link to the blog "Epeus' Epigone" and set the text field as "Kevin Marks".
 +
 
 +
If the version attribute is present, it should be used to drop in the relevant MIME-type on the link to the feed.
 +
 
 +
Ideally, an OPML-to-XOXO converter would also locate the HTML versions of feeds if the '''htmlUrl''' attribute is not there, and vice versa.
 +
 
 +
Another implementation note for the version attribute: it's a good idea to check for both upper and lower case versions (eg. the standard "RSS" and the lower-case "rss"). Although the values are enumerated in the specification, I'm betting there are probably misuses out there.
 +
 
 +
==== Include ====
 +
The '''type''' attribute in OPML 2.0 is set to '''include'''. Otherwise, the include mode is inferred if the '''type''' is set to '''link''' and the '''url''' attribute ends in ".opml".
 +
 
 +
Ideally, if the '''include''' mode is triggered, the HTML should represent it as a hyperlink to the OPML document, perhaps as follows:
 +
<pre><nowiki>
 +
<li><a href="[url]" type="application/xml+opml">[text]</a></li>
 +
</nowiki></pre>
 +
 
 +
==== Date-time stamp ====
 +
OPML contains a "created" time-stamp, which is generally used in outliners but not in feed readers. The created attribute uses [http://asg.andrew.cmu.edu/rfc/rfc822.html RFC 822] date format. The [[datetime-design-pattern]] could be used to represent it, perhaps with a classname of '''created'''.
 +
 
 +
=== Mapping proprietary extensions ===
 +
OPML is extensible through the use of namespaced elements and attributes.
 +
 
 +
There are some proprietary extensions which it would not be appropriate to map to XOXO. The GrazrScript extension is probably one of those. There is no value in mapping it to XOXO, as it will not serve any purpose. Converters should ignore it.
 +
 
 +
Currently, the following proprietary/non-canonical extensions to OPML can be mapped to XOXO:
 +
* '''grazr:name''' should map to the '''id''' attribute of the containing list element.
 +
* '''bb:rating''' (namespace: <nowiki>http://blogbridge.com/ns/2006/opml</nowiki>) is used by BlogBridge to provide an 'attention rating'.
 +
* '''bb:tag''' should be mapped to a [[rel-tag]], in the same way(?) as the category attribute.
 +
 
 +
=== Preliminary Mapping ===
 +
Currently, I am trying to work on a preliminary mapping from OPML and the internal Frontier outline format (on which OPML is strongly based on, and with which it is compatible. This is unreleased so far, but I put the mappings up so that people can suggest improvements to the semantics.
 +
 
 +
Each of the 'li' elements can, of course, use the compact attribute.
 +
 
 +
The text type:
 +
<pre><nowiki><li>[text()]</li></nowiki></pre>
 +
 
 +
The feed type:
 +
<pre><nowiki><li><a href="[@htmlUrl]" class="feed">[text()]</a> <a href="[@xmlUrl]" class="feed-xml" type="application/xml">RSS</a></li></nowiki></pre>
 +
(The text label "RSS" can, of course, be changed to 'feed' or 'Atom' or any other text label that you wish. The 'feed' class is to enable reverse XOXO-to-OPML)
 +
 
 +
The include type:
 +
<pre><nowiki><li><a href="[@include]" class="opml-include" type="[MIME]">[text()]</a></li></nowiki></pre>
 +
* MIME type still to be determined. I will be asking questions on the OPML mailing list to see if we can get some consensus on what MIME type to use. This may be a bit like opening a can of worms, but we'll see how it goes. --[[User:TomMorris|TomMorris]] 20:08, 10 Aug 2007 (PDT)
 +
** Consensus still doesn't exist.
 +
 
 +
=== Test Cases ===
 +
I have started maintaining a list of [http://tommorris.org/pages/opml-test-cases OPML Test Cases] and [http://tommorris.org/pages/opml-tools Tools] for work on OPML-XOXO work. I have also put up an [http://github.com/tommorris/opml-schema/tree/master unofficial test suite and schema]. --[[User:TomMorris|TomMorris]] 16:48, 7 Mar 2008 (PST)
 +
 
 +
=== Notes ===
 +
* [http://rbach.priv.at/Microformats/IRC/2007-08-10#T154821 IRC Log for 2007-08-10]
 +
* [http://copia.ogbuji.net/blog/2005-11-15/I_must_be_#1132141337.5 l.m. orchard's comment on Uche Ogbuji's blog post about XOXO and OPML]
 +
 
 +
== Converters ==
 +
* I [[User:Brian]] have been working on a web service to convert XOXO lists into OPML. The service can be found at [http://suda.co.uk/projects/microformats/xoxo/ http://suda.co.uk/projects/microformats/xoxo/] another can be found at [http://xoxotools.ning.com/outlineconvert.php http://xoxotools.ning.com/outlineconvert.php] and another here [http://decafbad.com/2005/11/gopher-ng/ http://decafbad.com/2005/11/gopher-ng/] for generating OPML to be used with Hyperscope.
  
 
== XSLT Issues ==
 
== XSLT Issues ==
Line 21: Line 125:
 
* notes: No issues importing, imports and tried to fetch non-rss/atom files.
 
* notes: No issues importing, imports and tried to fetch non-rss/atom files.
 
* autodiscovery: ???
 
* autodiscovery: ???
 +
 +
=== [http://boxtheweb.4x2.net/ BoxtheWeb] ===
 +
* Platform: online
 +
* notes: seems to correctly import the OPML, and will also import raw XOXO
 +
 +
== See also ==
 +
* [[xoxo-brainstorming]]

Latest revision as of 16:41, 23 July 2014

This page is used to document any issues in XSLT that converts XOXO to OPML, any short comings that need to be addressed in XOXO that are REQUIRED in OPML, and any issues with News Readers when they import OPML files that have been generated from XOXO. The original request was to the mailing list on September 25th

Draft conversion principles

OPML specifies limitations in a loose way, using the type attribute. There is some formal canonicalisation (in the 1.0 spec and the 2.0 spec) of what individual type attributes do. Type attribute values extend the standard attribute set of the outline node. So, for instance, the "rss" type value tells the processor to look for feed-specific values.

MIME type

OPML is usually served with a large variety of MIME types including:

  • text/html
  • application/xml
  • text/xml
  • text/x-opml

There have been suggestions as to whether or not to start serving OPML as:

  • application/xml+opml

One should not infer that something is or is not OPML based on the MIME type, because that's not reliable. But all OPML-to-XOXO tools ought to point to OPML files (includes, etc.) in a consistent way. That way, JavaScript could be laid over the top of XOXO outlines to allow them to include OPML outlines or link to them in a way that would proxy them back in to XOXO (etc.).

Text attribute

There is some confusion over the difference between the text and title attributes. Both are reasonably well-defined by the OPML specification, and serve different purposes. Some implementations of OPML break from the specification in providing a title attribute but not a text attribute. If a text attribute is present but not a title attribute, one should infer that the text attribute is equal to the title attribute. One should not infer that the title attribute is equivalent to the text attribute (see OPML 2.0 Spec).

The text attribute can and does often contain escaped HTML markup (which is really a bad practice, and has led to a lot of criticism of OPML). This is standard behavior from the OPML Editor. An OPML-to-XOXO parser should ideally take data from the text attribute and put it into a XOXO outline in a standard way.

Known type attribute values

Blank

A blank type attribute usually implies that it is a text node in an outline, using the text and created nodes. This is usually the behavior of most outliners and is the default behavior of the OPML Editor.

RSS Feed

The type attribute is set to string rss, implies following attributes:

  • text - usually, but not always, the title of the feed - is user-modifiable, so should not be used as feed title by applications
  • xmlUrl - the URL of an XML feed
  • htmlUrl - the URL of the HTML representation of the feed (optional)
    • Implementation note - Bloglines emits invalid OPML by sometimes including this attribute but leaving it blank [1].
  • language - the language of the feed (optional)
  • version - the particular type of XML feed:
    • RSS - The spec lists this as being used for RSS 0.9x and 2.0 feeds
      • MIME type: application/rss+xml
    • RSS1 - The spec lists this as being used for RSS 1.0 (RDF) feeds
      • MIME type: application/rdf+xml
      • Despite RSS 1.0 being "just RDF", plenty of people expect it to be RDF/XML, so no Turtle or JSON or embedded RDF-in-HTML.
    • scriptingNews - The spec lists this as being used for Scripting News format feeds
      • This is an edge case. I'm outputting application/rss+xml as MIME type unless anyone has any good argument to the contrary. This is because it should trigger most people's "RSS" mode in their browser, and then their RSS reader should be able to unpick it using whatever parsing magic is contained within.
      • An example scriptingNews feed [2], and a Python parser.
    • atom - In unofficial usage, this is used for Atom feeds of all types
      • MIME type: application/atom+xml
      • Dave Winer: "I don't know what the valid values are for the Atom version attribute. If someone who is an expert on Atom would provide them, and show me that there's some agreement about this from Atom experts, I would be happy to say something about this in the OPML 2.0 spec."
    • RSS2 - In unofficial usage, this is used to represent RSS 2.0 feeds only (although, according to the spec, they ought to use RSS.
  • description - The description field from the linked feed (optional)
  • title - The title of the linked feed (eg. "Engadget") (optional)
    • Best behavior with title is to grab the RSS/Atom feed and infer it from that, rather than relying on the OPML file to give it to you.

What should be done with an RSS feed node? Since it is almost the primary use of OPML, it would probably be advisable to optimize any conversion effort to deal efficiently with feeds.

The text attribute may list something different from the title of the linked feed, so that ought to be the value of the hyperlink - one may link to the blog "Epeus' Epigone" and set the text field as "Kevin Marks".

If the version attribute is present, it should be used to drop in the relevant MIME-type on the link to the feed.

Ideally, an OPML-to-XOXO converter would also locate the HTML versions of feeds if the htmlUrl attribute is not there, and vice versa.

Another implementation note for the version attribute: it's a good idea to check for both upper and lower case versions (eg. the standard "RSS" and the lower-case "rss"). Although the values are enumerated in the specification, I'm betting there are probably misuses out there.

Include

The type attribute in OPML 2.0 is set to include. Otherwise, the include mode is inferred if the type is set to link and the url attribute ends in ".opml".

Ideally, if the include mode is triggered, the HTML should represent it as a hyperlink to the OPML document, perhaps as follows:

<li><a href="[url]" type="application/xml+opml">[text]</a></li>

Date-time stamp

OPML contains a "created" time-stamp, which is generally used in outliners but not in feed readers. The created attribute uses RFC 822 date format. The Datetime Design Pattern could be used to represent it, perhaps with a classname of created.

Mapping proprietary extensions

OPML is extensible through the use of namespaced elements and attributes.

There are some proprietary extensions which it would not be appropriate to map to XOXO. The GrazrScript extension is probably one of those. There is no value in mapping it to XOXO, as it will not serve any purpose. Converters should ignore it.

Currently, the following proprietary/non-canonical extensions to OPML can be mapped to XOXO:

  • grazr:name should map to the id attribute of the containing list element.
  • bb:rating (namespace: http://blogbridge.com/ns/2006/opml) is used by BlogBridge to provide an 'attention rating'.
  • bb:tag should be mapped to a rel="tag", in the same way(?) as the category attribute.

Preliminary Mapping

Currently, I am trying to work on a preliminary mapping from OPML and the internal Frontier outline format (on which OPML is strongly based on, and with which it is compatible. This is unreleased so far, but I put the mappings up so that people can suggest improvements to the semantics.

Each of the 'li' elements can, of course, use the compact attribute.

The text type:

<li>[text()]</li>

The feed type:

<li><a href="[@htmlUrl]" class="feed">[text()]</a> <a href="[@xmlUrl]" class="feed-xml" type="application/xml">RSS</a></li>

(The text label "RSS" can, of course, be changed to 'feed' or 'Atom' or any other text label that you wish. The 'feed' class is to enable reverse XOXO-to-OPML)

The include type:

<li><a href="[@include]" class="opml-include" type="[MIME]">[text()]</a></li>
  • MIME type still to be determined. I will be asking questions on the OPML mailing list to see if we can get some consensus on what MIME type to use. This may be a bit like opening a can of worms, but we'll see how it goes. --TomMorris 20:08, 10 Aug 2007 (PDT)
    • Consensus still doesn't exist.

Test Cases

I have started maintaining a list of OPML Test Cases and Tools for work on OPML-XOXO work. I have also put up an unofficial test suite and schema. --TomMorris 16:48, 7 Mar 2008 (PST)

Notes

Converters

XSLT Issues

Currently the output from the XOXO to OPML web service does not validate against the BETA OPML validator at [3] the errors/warnings are vauge and are not really issues because the files are still correctly imported into Several Applications

I have no idea if there is an Enumerated List of possible TYPE values in OPML, at the moment i am using the TYPE attribute in HTML and that is MimeType.

New Readers

Please follow the format and add any issues with your News Reader

OmniOutliner

  • Platform: OSX
  • Version: 3.5 (v134.2)
  • notes: This correctly imports the OPML file and added a column for each attribute.

Vienna

  • Platform: OSX
  • Version: 2.0.4.2034
  • notes: No issues importing, imports and tried to fetch non-rss/atom files.
  • autodiscovery: ???

BoxtheWeb

  • Platform: online
  • notes: seems to correctly import the OPML, and will also import raw XOXO

See also