citation-strawman-01: Difference between revisions

From Microformats Wiki
Jump to navigation Jump to search
(→‎References: Add KEV documents which I've been using lately)
m (→‎Common Properties: Add 'format' (part of the DC KEV) instead of 'genre'. Better than 'genre' because it can be a common property.)
Line 97: Line 97:
** ''date'' ? (ISO 8601) — date published
** ''date'' ? (ISO 8601) — date published
** ''description'' ? — a description of the item being cited, or the reason it is being cited
** ''description'' ? — a description of the item being cited, or the reason it is being cited
** ''format'' ? — the type of resource being cited. (e.g. 'book', 'article', 'discussion')
** ''identifier'' * — an identifier for the item being cited
** ''identifier'' * — an identifier for the item being cited
** ''language'' ? (ISO 639-1 or ISO 639-2) — the language of the item being cited. (Not to be confused with the language of the citation itself, which should be indicated using standard markup — <code>lang</code> or <code>xml:lang</code>)
** ''language'' ? (ISO 639-1 or ISO 639-2) — the language of the item being cited. (Not to be confused with the language of the citation itself, which should be indicated using standard markup — <code>lang</code> or <code>xml:lang</code>)

Revision as of 13:40, 13 August 2008

Citation Stawman: "h3988"

This is a sketch for a citation microformat. Please raise any problems with this draft on the issues page; and suggest any expansions on the brainstorming page.

The final name of the microformat is not decided — "h3988" should be considered a working title to differentiate it from other proposed citation microformats. The root class name may eventually change to something more author-friendly.

Contributors

Editor: TobyInk

The editor acknowledges the input of contributors to the citation-formats, citation-examples and citation-brainstorming wiki pages.

Design Methodology

I have been through the citation-examples page, looking at which pieces of information are commonly used in citations on the Internet. Using that knowledge, and guided by the naming-principles (do not make up names from thin air, do not ignore earlier work) I have taken a subset of the terms from OpenURL (Z39.88) which correspond to the pieces of metadata used by citations in the wild.

I have then mapped these Z39.88 terms to semantic HTML, re-using existing microformats such as hCard for author and publisher information, and reusing existing design patterns such as the class-design-pattern the datetime-pattern for dates.

Schema

Separate, but largely overlapping schemas are provided for citations of books, journals, patents and dissertations. Websites may usually be cited as if they were journals, with the site name as the jtitle and the page title as the atitle.

As the schema use the same root class name, a method is needed to differentiate between different types of citation. This method is:

  1. If a btitle property exists within the citation, then it is a book citation;
  2. Else if stitle or jtitle exists, then it is a journal/website citation;
  3. Else if number exists, then it is a patent citation;
  4. Else if atitle exists, it is a dissertation citation.

Key

Based on Perl's standard quantifiers:

bold {1} MUST be present exactly once
italic* OPTIONAL, and MAY occur more than once
+ MUST be present, and MAY occur more than once
? OPTIONAL, but MUST NOT occur more than once
[square brackets] list of common values
(parentheses) data format
# comment
! awaiting documentation

Book Citations

  • h3988 {1}
    • btitle {1} — book title
    • edition ?
    • pub ? (hCard|text) — publisher
    • series ? — title of the series in which the book was published
    • Common properties

Journal Citations

At least one of jtitle or stitle MUST be present.

  • h3988 {1}
    • issue ? — issue of the journal, often numeric
    • jtitle ? — journal title
    • stitle ? — abbreviated (short) title. (e.g. "JMLA")
    • volume ? — volume of journal, often numeric
    • Common properties

Patent Citations

  • h3988 {1}

Dissertation Citations

  • h3988 {1}
    • advisor ? (hCard|text)
    • degree ? (text)
    • Common properties
      • common property atitle is REQUIRED.

Common Properties

    • atitle ? — article/chapter/section title, if citing a particular article within the publication
    • au * (hCard|text) — author of article, if article is being cited; or (rarely) editor if whole issue is being cited.
    • date ? (ISO 8601) — date published
    • description ? — a description of the item being cited, or the reason it is being cited
    • format ? — the type of resource being cited. (e.g. 'book', 'article', 'discussion')
    • identifier * — an identifier for the item being cited
    • language ? (ISO 639-1 or ISO 639-2) — the language of the item being cited. (Not to be confused with the language of the citation itself, which should be indicated using standard markup — lang or xml:lang)
    • pages ? — pages being cited, if only part of the publication is being cited. Has optional subproperties.
      • spage ? — start page
      • epage ? — end page
    • url * — URL of the item.

Footnotes

Additionally, this strawman microformat uses the existing (see RDFa, XHTML 2.0, distributed conversation brainstorming, etc) rel value of "cite" for linking from article text to an h3988 citation elsewhere on the page. (e.g. in footnotes or a bibliography section).

Parsers MAY follow rel=cite links to other pages, but are not required to. Authors should be aware that parsers might not follow off-page footnote links. On-page references are generally preferred.

When linking to a citation by ID attribute, the ID attribute should be on the h3988 citation itself, and not on a parent or ancestor node. e.g.:

PREFERRED: <cite class="h3988" id="ref01">...</cite>
ALLOWED:   <li class="h3988" id="ref01"><cite>...</cite></li>
NO:        <li id="ref01"><cite class="h3988">...</cite></li>

Properties

Hopefully most of the properties are self explanatory, but some deserve a fuller explanation.

h3988

This microformat is derived from the Z39.88 standard, so similar to hCard uses its ancestor's name as the root class name. The punctuation is dropped and the "Z" replaced with an "h", partly to follow a common pattern in microformat naming, but also to avoid clashes with COinS (another Z39.88+HTML-based standard).

The root element SHOULD be a <cite> element. It is often useful to give the citation an ID attribute. For example:

<cite class="h3988" id="ref01">...</cite>

au

The author of the item being cited. This can be expressed as either plain text:

<span class="au">Elizabeth David</span>

Or as an embedded hCard:

<span class="au vcard">
  <span class="fn n">
    <span class="given-name">Elizabeth</span>
    <span class="family-name">David</span>
  </span>
</span>

When possible, an embedded hCard is preferable, but in some cases (e.g. automatic conversion from another citation format which only treats an author as a string) this may not be possible. When more detail is required, hCard's role property SHOULD be used to indicate the person's role in the publication (e.g. 'editor', 'primary author', 'contributor', etc). Corporate authors SHOULD be indicated using an organisational hCard. i.e. class="fn org". For dissertations, the org property SHOULD be used to indicate the author's institution. For example:

<span class="au vcard">
  <span class="fn n">
    <span class="given-name">Joe</span>
    <span class="family-name">Bloggs</span>
  </span>,
  <span class="org">Poppleton University</span>,
  <span class="adr">
    <span class="region">Yorkshire</span>
    <abbr class="country-name" title="United Kingdom">UK</abbr>
  </span>
</span>

identifier and url

There are a number of identifying codes, such as ISBN, ISSN, DOI, Pubmed ID and so forth, which are sometimes used when citing resources, especially in certain specialist fields. Rather than have separate properties for each of these, an identifier property is provided which follows rules similar to tel and email in hCard, in that it has a type+value structure. For example:

<span class="identifier">
  <span class="type">ISBN</span>
  <span class="value">978-1-56619-909-4</span>
</span>

The following (case-insensitive) types are defined:

The identifier property may be used with a null type to indicate an author-defined identifier or an identifier from a scheme not included in this specification. Lastly, if a journal citation has an stitle property and identifier type match's the citation's short title, then the identifier is assumed to be from a scheme specific to that journal. (If the value is also numeric, then this is an equivalent semantic to OpenURL's "artnum".) For example:

<cite class="h3988">
  <span class="jtitle">Journal of Rat Studies</span>:
  <span class="atitle">Rats and their Bretheren</span>.
  [<span class="identifier">
    <span class="stitle type">JRat</span>
    <span class="value">12345</span>
  </span>]
</cite>

As an optimisation, the type and value can both be implied when the identifier is a link matching one of the following patterns:

  • http://dx.doi.org/X → type='DOI', value='X'
  • doi:X → type='DOI', value='X' (Unapproved URI scheme, but parsers should support this for future proofing.)
  • urn:doi:X → type='DOI', value='X' (Unapproved URN scheme, but parsers should support this for future proofing.)
  • urn:isbn:X → type='ISBN', value='X'
  • urn:issn:X → type='ISSN', value='X'
  • http://lccn.loc.gov/X → type='LCCN', value='X'
  • http://www.ncbi.nlm.nih.gov/pubmed/X → type='PMID', value='X'

For example:

<a class="atitle identifier" href="http://www.ncbi.nlm.nih.gov/pubmed/1"
>Formate assay in body fluids: application in methanol poisoning.</a>

Any identifier link which is not on the list of recognised optimisations above, parsers MUST NOT attempt to extract a type and value. (As this is a draft specification, the list above will expand though.) Instead, the type should be taken as null, and the value consists of the entire link URI.

The url property is different from an identifier. Rather than helping readers identify the cited resource, it helps them find the resource. The url MAY be a link to, say, an Amazon page to buy the book.

When citing a web page (rather than, say, a book or journal), you SHOULD mark up the page's URL as class="url identifier" as the URL both identifies the resource and allows people to find it.

url is unique in that it is the only property not directly derived from Z39.88. It has however been shown to be useful in hCard and various other microformats, and it is very common to include a URL when citing an online resource.

pages

If only part of a work is being cited, it if often useful to include the page number(s). You may do this as a simple string:

<span class="pages">206, 208&ndash;209</span>

In the case where a contiguous section is referenced, then additional spage and epage subproperties are available to mark up the start and end pages:

<span class="pages">
  <span class="spage">46</span> to <span class="epage">48</span>
</span>
<span class="pages">
  <span class="spage">146</span>&ndash;<abbr class="epage" title="148">8</abbr>
</span>

Parsers MUST NOT assume that pages will be numerically labelled. For instance, many books contain sections with pages numbered in roman numerals. If the subproperties have not been explicitly used, parsers SHOULD NOT attempt to extract a start and end page from the pages property — it SHOULD be treated as unstructured text.

pub

An hCard or plain text string for the book's publisher. As with the au property, an hCard is preferred. The place of publication SHOULD be encoded within this hCard using adr and appropriate subproperties, or the label property.

<span class="pub">Penguin Books, London</span>
<span class="pub vcard">
  <span class="fn org">Penguin Books</span>
  <span class="label">London</span>
</span>
<span class="pub vcard">
  <span class="fn org">Penguin Books</span>
  <span class="adr"><span class="locality">London</span></span>
</span>

stitle

This property is for marking up the short title of a journal. Most academic journals have well-established short titles which are often used to reference them. Of interest is that this property should be considered "immune" to the abbr-design-pattern. That is:

<abbr class="stitle" title="Foo">Bar</abbr>

should be parsed as "Bar", not "Foo". This allows a convenient pattern to be used:

<abbr class="jtitle stitle" title="British Journal of Medicine">BMJ</abbr>

Examples

Citation in Running Text

<p>
  According to
  <cite class="h3988">
    <i class="btitle">French Provincial Cooking</i> by
    <span class="au vcard">
      <span class="fn n">
        <abbr title="Elizabeth" class="given-name">E</abbr>
        <span class="family-name">David</span>
      </span>
    </span>
  </cite>
  to cook Filets de Macquereaux a la Tomate, you need to first coat the mackeral with flour.
</p>

A Book in a Bibliography

<h2>Bibliography</h2>
<ul>
  <li>
    <cite class="h3988" id="ref-fpc">
      [<span class="identifier">FPC</span>]:
      <i class="btitle">French Provincial Cooking</i>,
      <span class="au vcard">
        <span class="fn n">
          <abbr title="Elizabeth" class="given-name">E</abbr>
          <span class="family-name">David</span>
        </span>
      </span>,
      <span class="pub vcard">
        <span class="fn org">Penguin Books</span>,
        <span class="adr"><span class="locality">London</span></span>
      </span>,
      <span class="date">1960</span>.
    </cite>
  </li>
  <!-- ... -->
</ul>

Which might be cited in running text like:

<p>
  To cook Filets de Macquereaux a la Tomate, you need to first coat the mackeral
  with flour [<a rel="cite" id="#ref-fpc" rev="vote-for">FPC</a>].
</p>

References

Related Pages