citation-issues: Difference between revisions

From Microformats Wiki
Jump to navigation Jump to search
(added andy's comment about citation format conversion.)
(added Brian Suda's outstanding issues from brainstorming page.)
Line 37: Line 37:
'''Should a user agent provide retransmission of a citation in a new format?  Which ones?'''
'''Should a user agent provide retransmission of a citation in a new format?  Which ones?'''
A user agent should be capable of reading a citation from a web page, in a given format, and converting it into a second format, for use elsewhere. For a list of such formats, and examples, see [http://en.wikipedia.org/wiki/Citation#Format_styles  Wikipedia, Citation styles]. [[User:AndyMabbett|Andy Mabbett]] 11:05, 30 Mar 2007 (PDT)
A user agent should be capable of reading a citation from a web page, in a given format, and converting it into a second format, for use elsewhere. For a list of such formats, and examples, see [http://en.wikipedia.org/wiki/Citation#Format_styles  Wikipedia, Citation styles]. [[User:AndyMabbett|Andy Mabbett]] 11:05, 30 Mar 2007 (PDT)
== Outstanding Issues ==
Moved from [[citation-brainstorming]] by [[[[User:BenWest|BenWest]] 17:30, 8 Apr 2007 (PDT)]] available at http://microformats.org/wiki?title=citation-brainstorming&diff=0&oldid=15305
The 3 main points i (Brian) came across so far are:
1) IDENTIFIERS
2) FORMAT TYPES
3) NESTING
* {{OpenIssue}} '''How should different kinds of non-globally unique identifiers be represented?'''
1) In hCard/hCalendar there is a UID field. Added with URL it makes for a great unique identifier. There are loads of other identifers besides URL, ISBN, LOC call number, SKU, ISSN, etc. Many of these are unique in their domain, but not globally unique. So how to they get marked-up? Much like the hCard TEL/ADR properties, we can use something like:
<pre>
<nowiki>
<div class="uid"><span class="type">ISBN</span>: <span
class="value">123456</span></div>
</nowiki>
</pre>
This makes the encoding the most extensible... if we start use class="isbn" then it is an enumerated list, with class="type" it is open ended.
* {{OpenIssue}} '''What vocabulary should be used to distinguish between medium (aka format) versus type of work.  How do we resolve ambiguity between type of work and the publishing medium?'''
2) I keep mis-using "format", format is the medium - hardback, softback. The TYPE (there probably is a better word - container?) is book, article, conference, manifesto, etc. Much like the identifers we can make an enumerated list of values, class="book", class="article", but that boxes us in, whereas something like: <pre><nowiki><span class="type">article</span></nowiki></pre> leaves things more open.
* {{OpenIssue}} '''Should citations support nesting?'''
3) Nesting citation data in a citation. The ability to nest the same microformat inside itself is something that other microformats don't explicitly handle.
The two options are:
i) Using class="book"
<pre>
<nowiki>
<div class="hcite">
<div class="book">
  <span class="fn">Book Title</span>
  <div class="chapter">
    <span class="fn">Chapter Title</span>
  </div>
</div>
</div>
</nowiki>
</pre>
This makes things easy to nest and to figure out exactly what is
associated with what, but the downside is that we have enumerated
lists of values for the class properties.
ii) using the TYPE for book
<pre>
<nowiki>
<div class="hcite">
<div class="type">book</div>
<span class="fn">Book Title</span>
<div class="type">chapter</div>
<span class="fn">Chapter Title</span>
</div>
</nowiki>
</pre>
now the class="fn" is not nested inside the class="book" or
class="chapter" so there would have to be some other mechanism to
associate the data with the type.

Revision as of 00:30, 9 April 2007

BenWest will start this by reorganizing material from http://microformats.org/wiki?title=citation-brainstorming&diff=0&oldid=15286.

Issues

  • Generally, use cases are used to flesh out requirements, but I don't see any on this page, so I've added a new section for this. Here are some suggested requirements. ThomasBreuel
  • I've made these into issues. [[BenWest 16:56, 8 Apr 2007 (PDT)]]
  • open issue!

Lossless Round-Trip Conversions

Should citation support roundtrip conversions? Which formats should be supported? One of the primary uses for a citation format is to permit people to put individual citations or entire bibliographies on the web. For that purpose, it's important that if someone puts up my bibliography on the web and someone else downloads it, they actually get back the citations correctly, and don't have to spend time fixing up the citations manually. Therefore, I suggest the following requirement.

If X is one of the common citation formats (BibTeX, EndNote, etc.), then conversion of the form X -> hCitation -> X must not lose information and must not require manual fixing up of the result.

Note that this has multiple components. First, for a format like BibTeX, it's important that the field names be preserved. Second, in general, markup (italics, math, chemical formulas, spacing, special characters) needs to be preserved.

  • open issue!

===Citation Markup=== Should citations preserve presentation? Citations may contain markup, such as italics, subscripts, superscripts, special characters, and chemical formulas. For a correct presentation of the citation format to the user, the format must permit even fairly complex markup. Note that this markup cannot easily be converted automatically between different bibliographic processors.

  • open issue!

===Encapsulation of Non-Textual Content=== Should citations support non-textual content? Systems like document image processors need to be able to represent semantic roles of parts of pages without actually giving a usable textual representation. For example, a system might segment citations into authors, titles, volumes, and years, but represent the actual content of those fields using image tokens rather than characters. Furthermore, no text to put into an ABBR tag may be available

  • open issue!

===No New Semantics=== Should citations avoid introducing new semantics? The proposals for a citation microformat, as they now stand, suggest creating a new format that differs from existing formats not just syntactically, but semantically (different choices of field types than other formats, different handling of proper names, different handling of publications that are part of collections, etc.). This has some serious consequences; in particular, it means that translation into any existing format is not just a simple syntactic transformation, it requires that an tools that deals with the citation microformat needs to be updated to handle new semantics, in addition to new syntax. An alternative is to define one or more microformats that are strictly a syntactic transformation of existing formats (e.g., encapsulated BibTeX, encapsulated Endnote).

So, a possible requirement to consider is that citation microformats introduce no new semantics, but are a strict syntactic encapsulation of existing citation formats.


  • open issue!

Convert citation formats

Should a user agent provide retransmission of a citation in a new format? Which ones? A user agent should be capable of reading a citation from a web page, in a given format, and converting it into a second format, for use elsewhere. For a list of such formats, and examples, see Wikipedia, Citation styles. Andy Mabbett 11:05, 30 Mar 2007 (PDT)


Outstanding Issues

Moved from citation-brainstorming by [[BenWest 17:30, 8 Apr 2007 (PDT)]] available at http://microformats.org/wiki?title=citation-brainstorming&diff=0&oldid=15305


The 3 main points i (Brian) came across so far are: 1) IDENTIFIERS 2) FORMAT TYPES 3) NESTING

  • open issue! How should different kinds of non-globally unique identifiers be represented?

1) In hCard/hCalendar there is a UID field. Added with URL it makes for a great unique identifier. There are loads of other identifers besides URL, ISBN, LOC call number, SKU, ISSN, etc. Many of these are unique in their domain, but not globally unique. So how to they get marked-up? Much like the hCard TEL/ADR properties, we can use something like:


<div class="uid"><span class="type">ISBN</span>: <span
class="value">123456</span></div>

This makes the encoding the most extensible... if we start use class="isbn" then it is an enumerated list, with class="type" it is open ended.


  • open issue! What vocabulary should be used to distinguish between medium (aka format) versus type of work. How do we resolve ambiguity between type of work and the publishing medium?

2) I keep mis-using "format", format is the medium - hardback, softback. The TYPE (there probably is a better word - container?) is book, article, conference, manifesto, etc. Much like the identifers we can make an enumerated list of values, class="book", class="article", but that boxes us in, whereas something like:

<span class="type">article</span>

leaves things more open.


  • open issue! Should citations support nesting?

3) Nesting citation data in a citation. The ability to nest the same microformat inside itself is something that other microformats don't explicitly handle.

The two options are: i) Using class="book"


<div class="hcite">
 <div class="book">
  <span class="fn">Book Title</span>
  <div class="chapter">
     <span class="fn">Chapter Title</span>
  </div>
 </div>
</div>

This makes things easy to nest and to figure out exactly what is associated with what, but the downside is that we have enumerated lists of values for the class properties.

ii) using the TYPE for book


<div class="hcite">
 <div class="type">book</div>
 <span class="fn">Book Title</span>
 <div class="type">chapter</div>
 <span class="fn">Chapter Title</span>
</div>

now the class="fn" is not nested inside the class="book" or class="chapter" so there would have to be some other mechanism to associate the data with the type.