citation-brainstorming: Difference between revisions

From Microformats Wiki
Jump to navigation Jump to search
Line 98: Line 98:
Citations may contain markup, such as italics, subscripts, superscripts, special characters, and chemical formulas.  For a correct presentation of the citation format to the user, the format must permit even fairly complex markup.  Note that this markup cannot easily be converted automatically between different bibliographic processors.
Citations may contain markup, such as italics, subscripts, superscripts, special characters, and chemical formulas.  For a correct presentation of the citation format to the user, the format must permit even fairly complex markup.  Note that this markup cannot easily be converted automatically between different bibliographic processors.

===Possible Requirement: Encapsulation of Non-Semantic Content===
===Possible Requirement: Encapsulation of Non-Textual Content===
Systems like document image processors need to be able to represent semantic roles of parts of pages without actually giving a usable textual representation.  For example, a system might segment citations into authors, titles, volumes, and years, but represent the actual content of those fields using image tokens rather than characters.  Furthermore, no text to put into an ABBR tag may be available

===Possible Requirement: No Additional Converters===
===Possible Requirement: No Additional Converters===

Revision as of 20:48, 8 April 2007

Citation Brainstorming

See also


  • ...
  • ... (a bunch of good folks!)
  • Tantek Çelik
  • Tim White
  • Michael McCracken
  • Brian Suda
  • Andy Mabbett

Use Cases

To focus the discussion, please add use cases below that will help show what problems the citation microformat will be solving.

I've included two, focusing on consuming information - I've assumed that use cases for generating microformatted content would just involve the desire to enable your content to be consumed better, but I'm interested to see if there's something I'm missing here -Mike

Acquiring reference information from the web

A user either finds an author's papers page, or is viewing the results of a search and would like to import the information about the displayed papers into their local reference database, for the purposes of cataloging things they've read, adding notes, and using the information to generate later citations, potentially in other forms, such as BibTeX or Docbook, for inclusion in a publication of their own.

Notes: In this case, it isn't important to the user what format the citation takes as displayed on the page where they find it. What *is* important is that it contains enough information to allow generation of the format they will ultimately re-publish it in. This implies that it may be worthwhile to err a little on the side of verbosity.

Also, links to downloadable full representations of the cited work are very important - e.g. a link to the PDF of a journal article, or to a music file.

Subscribing to reading lists, periodicals, etc

I would like to be able to leverage my news aggregator with hAtom to subscribe to a remote source for citation information, for example:

  • a reading list for a seminar
  • The publication list for a conference (e.g., subscribe to SIGGRAPH and see the updated conference proceedings every year)
  • the issues of a journal
  • a particular research group or researcher's publications
  • Not just research: a popular author's publications (e.g., Malcolm Gladwell's Archive)

Aggregating reading lists and reviews

A citation microformat-specific aggregator could provide a decentralized version of CiteULike. Libraries, authors, research groups, and publishers could mark up their collections, while other people on weblogs or review sites could add tags and reviews.

At least, having a well-adopted microformat would make writing tools like CiteULike much better, since it relies in some cases on screen-scraping publisher web-sites.

Cut & Paste from web pages

Capturing/copying HTML from web pages for use in other applications (especially when those apps present HTML as output), such as pasting into Word, or a specialized application like Google Notebook, Onfolio or Kaboodle. When such captures are made, it makes sense to keep track of the full citation data, including the date it was accessed, which may or may not be the date it was published.

Blogs quoting other resources, including blogs

Any blog that cites online content, whether a blog or news article, could use an hCitation to properly link to the cited reference. Such citations could include the access date when the blogger made the citation, because resources on the other side of those links can change without notice.

Instead, today we have simple formating with a link to the permaURL. The citation data is completely lacking. See Doc Searl's blog for a style of referencing that could benefit from proper a citation uF.

Fascinating... after I added the last two use cases, I realized they focus on potentially marginal cases. The first because it is missing the "output" part of the cut & paste, where the uF would actually be used as part of the paste. The latter because bloggers have a working citation mechanism that is just a link to the URL (hopefully a permaURL). One could argue they wouldn't want a full hCitation. And in fact, until a tool exists that makes it easy, they probably won't. However, a tool that cuts & pastes from anywhere on the web into a blog with a full citation seems like a nice tool. But again, I'm not really paving the cowpaths with these ideas. -Joe Andrieu

Finding in Library

Find a copy of the cited work in a nearby library (as with OpenCOinS). AndyMabbett 04:55, 4 Nov 2006 (PST)

Buy a copy

Find the cited work on, for example, Amazon or ABE; or subscribe to a journal via its own website. AndyMabbett 04:55, 4 Nov 2006 (PST)

Find reviews

Find third-party reviews of the cited work. AndyMabbett 04:55, 4 Nov 2006 (PST)

Give citation data for the page being visited

Adding a class of, say, "self" to an attribute of the proposed strawman would allow users (or user agents) to extract the data required to cite the page being visited, when referring to it elsewhere. There would be the added advantage of allowing the citation to be ignored by any parser which might be building a "tree" of citations, and preventing the setting up of an infinite loop.

For evidence of published "self citation" data (albeit on a secondary page) see the "cite this article" link on any Wikipedia entry, e.g. [1] from [2].

See also Proposal to include on-page citation data in Wikipedia

Andy Mabbett 13:47, 20 Mar 2007 (PDT)

Convert citation formats

A user agent should be capable of reading a citation from a web page, in a given format, and converting it into a second format, for use elsewhere. For a list of such formats, and examples, see Wikipedia, Citation styles. Andy Mabbett 11:05, 30 Mar 2007 (PDT)


Generally, use cases are used to flesh out requirements, but I don't see any on this page, so I've added a new section for this. Here are some suggested requirements. ThomasBreuel

Possible Requirement: Lossless Round-Trip Conversions

One of the primary uses for a citation format is to permit people to put individual citations or entire bibliographies on the web. For that purpose, it's important that if someone puts up my bibliography on the web and someone else downloads it, they actually get back the citations correctly, and don't have to spend time fixing up the citations manually. Therefore, I suggest the following requirement.

If X is one of the common citation formats (BibTeX, EndNote, etc.), then conversion of the form X -> hCitation -> X must not lose information and must not require manual fixing up of the result.

Note that this has multiple components. First, for a format like BibTeX, it's important that the field names be preserved. Second, in general, markup (italics, math, chemical formulas, spacing, special characters) needs to be preserved.

Possible Requirement: Citation Markup

Citations may contain markup, such as italics, subscripts, superscripts, special characters, and chemical formulas. For a correct presentation of the citation format to the user, the format must permit even fairly complex markup. Note that this markup cannot easily be converted automatically between different bibliographic processors.

Possible Requirement: Encapsulation of Non-Textual Content

Systems like document image processors need to be able to represent semantic roles of parts of pages without actually giving a usable textual representation. For example, a system might segment citations into authors, titles, volumes, and years, but represent the actual content of those fields using image tokens rather than characters. Furthermore, no text to put into an ABBR tag may be available

Possible Requirement: No Additional Converters


  • (from a mailing list):

if you want to cite a [biomedical journal] journal article on Wikipedia [...] you can export a correctly-formatted citation for Wikipedia from HubMed using unAPI...

  • Zotero, a Firefox extension to help collect, manage, and cite research sources.
Andy Mabbett 09:13, 21 Mar 2007 (PDT)

Original hBib Discussion

During the WWW2005 Developer's Day microformats track, Rohit Khare gave a presentation where he discussed the microformats The microformats process, and then did a quick demonstration wherein a bunch of us got on a shared Subethaedit document, and brainstormed some thoughts on what an "hBib" bibliography citation microformat would look like. Rohit placed the document on his Commercenet site.

An attempt to summarize and inline the linked document follows. -Mike

Two major goals were outlined by the group:

  • Avoid re-keying references
  • Adapt to new journal styles by changing CSS

The fundamental problem was discussed in terms of display - the ability to transform XHTML+hBib into the many journal-specific formats. For example, how to display "" when all authors are present in the source, and how to re-order the elements if a style defines a set order of elements that conflicts with the ordering in the source. Using hCard for authors was agreed on, and the beginnings of an example were shown.

XHTML Structure

With my exprience working X2V and hCa* has taught me what elememts are easy to find and which are not. Since the Citation microformat is very new it is possible to not make a lot of the same errors twice and to make things easier for extracting application to find and imply certain properties.

  • There should be some sort of 'root node' that implies all child elements are for the hCitation microformat.
  • Since most people will have multiple citations there should be away to represent each hCitation object as a unqiue block independent of another. This is to keep the parse from finding 'author' and applying that to all citations. Each citation should be in a container (class="hcite") that is separated from others.
  • Perhaps class="hcite" with <cite> recommended as the root element. E.g. <cite class="hcite">

Note: This section was the original content of the document. Since then, class='hcite' has been agreed on as the root class name. See explanation.

Citation vs. Media Info

What distinguishes a cite from say Media Info (e.g. media-info-examples) is that a cite is a reference to something explicitly external to the current piece of content or document, whereas Media Info describes information about content embedded or inline in the current document.

Semantic Meaning

One of the guiding priniciple of Microformats is to use the most semantically rich element to describe each node (Point 2 of Semantic XHTML Design Principles: Use the most accurately precise semantic XHTML building block for each object etc). Since we are dealing with HTML and citations, several elements are candidates to be used to enrich the semantic meaning. CITE, BLOCKQUOTE, Q, A, (are there more?)

The Citation Brainstorming Page has a few development and ideas about how to give another person credit for a link. Some of the semantic ideas behind their choices of tags can be applied to a full bibliographic type reference. Does this sentence make sense only historically? -Mike

OCLC's WorldCat for titles

Question: what about using something like OCLC's WorldCat for linking titles? - Tim White

This and That

After reading through alot of different citation encoding formats, i noticed that each format was being used in onw of two ways. It was either to describe the Current page (THIS.PAGE) or being used to encode references that point to external resources (THAT.PAGE)

The informatation being encoded was identical for both resources (author, date, name, etc) they just reference different things. For this microformat, i'm not sure if we want to try to solve both problems, or just one? The meta tags in the head element would be the ideal place for information about the THIS.PAGE, but that is not in following with the ideals of microformats where information is human-readable. The THAT.PAGE idea where a list of references is at the end of a document in the form of a bibliography is more inline with the ideals of a microformat where the data is human-readable. That doesn't mean that data about the current document shouldn't be human-readable, so some of the same properties used to reference extermal resources can be used for the current document (THIS.PAGE). To do this a different root item could be used and transforming applications could either extract the citation data about the current page, or information about this page's references.

This is open for discussion, but either way, i believe that the properties used to describe a page will be the same for both THIS and THAT. brian suda

More on This and That

Citation microformats are being explored as a possibility for citing genealogical information at Dan Lawyer's blog.

This is a case where frequently the citation would refer to (THIS.PAGE), but would have nested within it a reference to (THAT.PAGE), possibly a few levels deep. For instance, a web page might contain data extracted from a microfilm of a census. The citation would need to include information about the web page, information about the microfilm, and information about the census. Genealogical citations are expected to include the repository (where can this book or microfilm be found. Is this the same as venue?). So, at each level the information should contain the repository of the referenced item. A nesting (recursive) mechanism for citation microformats would be useful in this case. Is this the function of the "container" element in the Straw Format?

Date Formatting

Since microformats are all about re-use and the accepted way to encode Date-Time has been pretty much settled, then this is a good place to start when dealing with all the different date citation types.

These are all the different fields from various citation formats that are of temporal nature:

* Date (available | created | dateAccepted | dateCopyrighted | dateSubmitted | issued | modified | valid)
* originInfo/dateIssued
* originInfo/dateCreated
* originInfo/dateCaptured
* originInfo/dateOther
* month
* year
* Copyright Year
* Date - Generic
* Date of Confernce
* Date of Publication
* Date of update/revisou/issuance of database record
* Former Date
* Entry Date for Database Record
* Database Update
* Year of Publication

There are several common properties across several citation domains and will certainly be in the citation microformat, the unique instances will need further consideration, otherwise there could be no end to posiblities.

There are also several properties (year, month, Year of publication) that can be extracted from another source. Therefore, if you only encode a more specific property such as; Date of Publication, you can extract the 'year of publication' from that. Since the date-time format we are modeling after is the ISO date-time format, just the Year portion is an acceptable date. So if you ONLY know the year of publication, the you can form a valid 'Date of Publication' as a microformat (which inturn is a valid 'year of publication') - you milage may vary when it comes to importing into citation applications.


It seems to me that these can be collapsed to maybe one or two different date properties. As far as the specific human readable formatting of the date, that can be chosen per whatever the presentation style guide says, and the Datetime Design Pattern used to simplify the markup. - Tantek

Important Sometimes we need a date range and not simply a date (e.g. 4-6 May 2006). See Conference Citation examples later on this page. - Discoleo

Seasons Some journals have seasonal issues (e.g. "Summer 2006 edition") instead of, or as well as, editions labelled by month or other calendar-date. AndyMabbett 05:05, 4 Nov 2006 (PST)


Some of the citation formats has a place for 'keywords' or 'generic tags', etc. This might be a good place to re-use the RelTag microformat. The downside would be that they are then forced to be links, which might be the correct way to mark-up these terms.

MARC / MODS / Dublin Core

The MODS (example) and Dublin Core (example) transformations of MARC21 may contain some useful ideas.

Here's a first attempt at rewriting the linked examples in XHTML (written in response to a mailing list query about encoding book information with microformats):

<div class="book" lang="en">
  <h3 class="fn">Arithmetic /</h3>
  <p>By <span class="creator"><span class="fn">Sandburg, Carl</span>,
     <span class="date">1878-1967</span></span>,
     and <span class="illustrator">Rand, Ted</span></p>
  <p>Publisher: <span class="publisher"><span class="fn">Harcourt Brace Jovanovich</span>,
     <span class="locality">San Diego</span></span></p>
  <p>Published: <span class="issued">1993</span></p>
  <p class="description">A poem about numbers and their characteristics. Features
     anamorphic, or distorted, drawings which can be restored to normal by viewing
     from a particular angle or by viewing the image's reflection in the provided
     Mylar cone.</p>
  <p class="note">One Mylar sheet included in pocket.</p>
    <li class="subject">Arithmetic</li>
    <li class="subject">Children's poetry, American.</li>
    <li class="subject">Arithmetic</li>
    <li class="subject">American poetry</li>
    <li class="subject">Visual perception</li>

Basic Citation Stuctures

There are basic structures to any citation, this is an overview of some of the types

Concerns not addressed by existing formats

There are some aspects NOT adequately covered by existing formats. I have addressed this issue on the wiki page, too. [see for an extending discussion, the paragraph on Reference Types]

These issues pertain mainly to Errata, Comments and Authors Reply and Article Retractions.

  • a bidirectional link could be necessary to implement these features (original article <=> eratum, reply, retraction letter)
  • IMPORTANT: Errata
    • Erata: one or more Corrections might be posted in various issues of the journal
    • this is usually cited as: Orininal Article Citation Data (Correction available in Journal, Issue Nr, Year, Pages) (repeat for more than one correction)
    • it is possibly never cited alone
    • there should be a link to the original article, while the original article should contain a link to this Errata
  • IMPORTANT: Commentary and Author Reply
    • similar to Errata, there might be one or more Comments and Author Replys; this should be stored, too
    • however, it is usually not included in the original citation
    • it might be used however in a citation, but I do not know exaclty how to cite it optimally (original article should be provided as well)
  • IMPORTANT: Article Retraction
    • an article may be retracted because of plagiarism or some other flaw
    • this should not be used any further in the research
    • however, it might be used e.g. for an article on plagiarism or flawed research
    • there should be therefore one field storing this information, too, and a link to:
    • the published withdrawal letter (which explains why the article was retracted)
  • this issue may need a time-controlled event
  • IMPORTANT: electronic publishing ahead of print (EPUB)
    • more and more articles are initially posted online, before the published article gets actually printed
    • How should this be used/cited?
    • Is this changed, after the print version becomes available?

Outstanding Issues

The 3 main points i (Brian) came across so far are: 1) IDENTIFIERS 2) FORMAT TYPES 3) NESTING

1) In hCard/hCalendar there is a UID field. Added with URL it makes for a great unique identifier. There are loads of other identifers besides URL, ISBN, LOC call number, SKU, ISSN, etc. Many of these are unique in their domain, but not globally unique. So how to they get marked-up? Much like the hCard TEL/ADR properties, we can use something like:

<div class="uid"><span class="type">ISBN</span>: <span

This makes the encoding the most extensible... if we start use class="isbn" then it is an enumerated list, with class="type" it is open ended.

2) I keep mis-using "format", format is the medium - hardback, softback. The TYPE (there probably is a better word - container?) is book, article, conference, manifesto, etc. Much like the identifers we can make an enumerated list of values, class="book", class="article", but that boxes us in, whereas something like:

<span class="type">article</span>

leaves things more open.

3) Nesting citation data in a citation. The ability to nest the same microformat inside itself is something that other microformats don't explicitly handle.

The two options are: i) Using class="book"

<div class="hcite">
 <div class="book">
  <span class="fn">Book Title</span>
  <div class="chapter">
     <span class="fn">Chapter Title</span>

This makes things easy to nest and to figure out exactly what is associated with what, but the downside is that we have enumerated lists of values for the class properties.

ii) using the TYPE for book

<div class="hcite">
 <div class="type">book</div>
 <span class="fn">Book Title</span>
 <div class="type">chapter</div>
 <span class="fn">Chapter Title</span>

now the class="fn" is not nested inside the class="book" or class="chapter" so there would have to be some other mechanism to associate the data with the type.

Brian's straw format

implied schema (examples)

+ publisher
+ language
+ description
+ title
+ creator
+ journal
+ volume
+ issue
+ page 
+ edition
+ identifier
+ tags
+ format
+ date published
+ copyright
- audience

implied schema (formats)

+ publisher
+ language
+ description
+ title
+ creator
+ volume
+ pages
+ edition
+ issue
+ identifier
+ tags
+ format
+ date published
+ date copyrighted
- subtitle
- image 
- excerpt
- index terms
- series title
- publication
- journal
- part (1 of X)

UNION of the two schemas

+ (PLUS) means common properties
- (MINUS) means unique to the schema

Working straw schema

This list records discussion about the common schema from above. The format is descriptive-name (optional-recommended-element 'class-name') (link to explanation).

If there is no explanation link, that field should be considered either obvious or up for debate. If you're not sure which, it's up for debate.

  • root element ('hcite') (explanation)
    • title ('title')
    • Author / Editor etc. ('creator')
    • Pages ('pages')
      • note: this can be any value
    • container ('container hcite')
    • Volume Number ('volume')
    • Edition ('edition')
    • Issue number ('issue')
    • Tags (href rel='tag')
    • Format ('format')
      • Note - this is unclear at present - does format mean 'type', as in 'book' vs. 'article'? --Mike 22:53, 16 Jan 2007 (PST)
    • date published ('date-published') (explanation)
    • date accessed ('date-accessed') (explanation)
    • publisher
    • language
    • Abstract / description ('description')
    • URI (href class='uri') (explanation)
    • identifier
      • an (not necessarily globally unique) identifier, such as a cite-key, pubmed ID number, or simply the reference number or string within a publication ([1] or [CLRS2001])

Notes about missing / changed fields in the schema

This section lists fields that are intentionally not included in the straw schema, or are not represented directly, and links to discussion about each.


Markup examples using the above format:


This is Brian's original example

<ul class="bibliography">
	<li class="hcite" xml:lang="en-gb">
		<!-- publisher data as hCard--;
		<div class="publisher vcard">
			<span class="fn org">ABC Publishing Co.</span>
			<span class="country-name">United Kingdom</span>
		<!-- author(s) data as hCard -->
		<div class="creator vcard">
			<span class="fn n"><span class="given-name">John <span class="family-name">Doe</span></span>

		<!-- location data -->
		<span class="fn">Foobar!</span>
		<span class="description">World Class Book about foobar</span>
		<span class="volume">1</span>
		<span class="issue">1</span>
		<span class="edition">1</span>
		<span class="pages">1-10</span>
		<span class="format">article</span>
		<!-- differed to the UID debate -->
		<span class="identifier">12345678</span>
		<!-- keywords -->
		<a class="keyword" rel="tag" href="/tags/foo">foo</a>
		<span class="keyword">bar</span>
		<!-- date properties -->
		Published <abbr class="date-published" title="20060101">January 1st 1006</abbr>
		Copyright <abbr class="copyright" title="20060101">2006</abbr>

<p class="hcite">Have you read <span class="title"><abbr title="book" class="format">Foo Bar</abbr></span>? 
It was written by <span class="author vcard"><span class="fn">John Doe</span></span>. 
It only came out a <abbr class="dtpublished" title="20060101">few months ago</abbr></p>

Note: the "format" property above is incorrect. Format would refer more the physical characteristics of an item, rather than its type or genre (e.g. "article", "book", etc.). I'd rather have the main class for the li be "article" in this context, than the fairly meaningless "citation." Of course, one could have both, which would be fine too. -- bruce

Note: Could we use ROLE from hCard to identify editors, translators, authors, etc? This was discussed on the mailing list and the idea was dropped [3]

Comments : singpolyma 08:03, 16 Jun 2006 (PDT) : keywords should be rel="tag", and probably also XOXO 1.0: Extensible Open XHTML Outlines (the same way the citation list is)

RCanine 11:55, 18 Dec 2006 (EST) :

  • Is there a reason not to re-use "published" from hAtom instead of inventing a new, basically equivalent term in "dtpublished"?
    • note - date-published was decided on for the field, example changed to reflect it --Mike 10:12, 30 Mar 2007 (PDT)
  • Missing a URL/URI/IRI/UID etc. field example (ISBN for Book).
  • Does the "copyright" class conflict with WHATWG's definition?
  • WRT Bruce's comment, I'm currently using class="article citation" for my writing, as it has the most flexibility with CSS styles for titles (e.g. Book titles .citation>.fn must be italicized, while article titles must not, their container should).
  • Speaking of containers, we need an "in" or "collection" field for journal articles or articles-in-books, or is that covered by "publisher"?

Citing Private Communication

Needs an example.

Citing Legal Cases

Needs an example. see Wikipedia example for inspiration.

Citing a Book

needs an example

Citing a journal article

From an old entry in PubMed - J Aersp Med. link

<span class="hcite">
  <span class="creator vcard"><span class="fn">R R Burton</span></span>,
  <span class="creator vcard"><span class="fn">S D Leverett</span></span>, and
  <span class="creator vcard"><span class="fn">E D Michaelson</span></span>

  <span class="title">Man at high sustained +Gz acceleration: a review.</span>
  In  <span class="container hcite">
    <abbr class="type" title="Journal">J.</abbr><abbr class="title" title="Aerospace medicine">Aersp. Med.</abbr>
    <span class="uri uid">urn:issn:0001-9402</span>
    <span class="volume">45</span>
    <span class="issue">10</span>
    <abbr class="date-published" title="101974">Oct, 1974</abbr>
  </span>, pages <span class="page">1115-36</span>.


Note, I'm not entirely sure about the issn urn here.

Citing a magazine article

needs an example

Citing a Patent

Drawn from this example from Wikipedia:

<li class="hcite"><a href=",405,829" class="url" 
<span class="format">U.S. Patent</span> <span class="identifier">4,405,829</span></a>:
    <span class="description">The <a href="/wiki/RSA" title="RSA">RSA</a> patent, a famous software patent on the ground-breaking 
    and highly unobvious algorithm for public key encryption, widely used for secure communications 
    in many industries nowdays</span>

Citing a conference publication

Based on the conference publication reference example.

Changed Oct 06 to conform with Brian's format. --Mike 18:09, 12 Oct 2006 (PDT) (everything but the url class should be in line with that proposal)

L. Hochstein, J. Carver, F. Shull, S. Asgari, V. Basili, J. K. Hollingsworth, and M. Zelkowitz, “Hpc programmer productivity: A case study of novice hpc programmers,” in Proceedings of ACM/IEEE Supercomputing Conference, 2005.

<span class="hcite">
  <span class="creator vcard"><span class="fn">Lorin Hochstein</span>
  <span class="org"> University of Maryland, College Park </span></span>,
  <span class="creator vcard"><span class="fn"> Jeff Carver </span> 
  <span class="org"> Mississippi State University </span> </span>,
  <span class="creator vcard"><span class="fn"> Forrest Shull </span> 
  <span class="org"> Fraunhofer Center Maryland </span> </span>,
  <span class="creator vcard"><span class="fn"> Sima Asgari</span> 
  <span class="org"> University of Maryland, College Park </span> </span>,
  <span class="creator vcard"><span class="fn"> Victor Basili</span> 
  <span class="org"> Fraunhofer Center Maryland </span> </span>,
  <span class="creator vcard"><span class="fn"> Jeffrey K. Hollingsworth</span> 
  <span class="org"> University of Maryland, College Park </span> </span>, and 
  <span class="creator vcard"><span class="fn"> Marv Zelkowitz</span> 
  <span class="org"> University of Maryland, College Park </span> </span>,
  <a class="title url" href="">HPC Programmer Productivity: A Case Study of Novice HPC Programmers</a>. 
  (<span class="format">conference publication</span>)
  <span class="container hcite">
    <a class="title url" href="...">Proceedings of ACM/IEEE Supercomputing Conference</a>
    <abbr class="date-published" title="20051126T0000-0800">2005</abbr>
  page <span class="pages">35</span>
  <span class="publisher vcard">
    <span class="fn">IEEE Computer Society
    <span class="adr">
      <span class="locality">Washington</span>,
      <span class="region">DC</span>
  <a class="url eprint" href="">PDF of full text from ACM</a>
  DOI: <a class="url uid" href="">10.1109/SC.2005.53</a>
  <a class="keyword" rel="tag" href="results.cfm?query=genterm%3A%22Design%22 ...">Design</a>,
  <a class="keyword" rel="tag" href="results.cfm?query=genterm%3A%22Experimentation%22 ....">Experimentation</a>,
  <a class="keyword" rel="tag" href="results.cfm?query=genterm%3A%22Measurement%22...">Measurement</a>,
  <a class="keyword" rel="tag" href="results.cfm?query=genterm%3A%22Performance%22 ...">Performance</a>

  <span class="description">In developing High-Performance Computing (HPC) software, ....</span>

Note (From Discoleo, Sept. 06)

  • sometimes, the citation must include Town/Country and Precise Date/Date Range, e.g.
    • Gillespie SH, Dickens A. Variation in mutation rate of quinolone resistance in Streptococcus pneumoniae [abstract P06-17A]. In: Abstracts of the 3rd International Symposium on Pneumococci and Pneumococcal Disease (Anchorage, 5-9 May 2002).Washington, DC: American Society of Microbiology, 2002.
    • Bassetti, M.; Righi, E.; Rebesco, B.; Molinari, MP.; Costa, A.; Fasce, R.; Cruciani, M.; Bassetti, D.; Bobbio Pallavicini, F. 44th Annual Interscience Conference on Antimicrobial Agents and Chemotherapy (ICAAC). Washington, DC; 2004. Epidemiological trends in nosocomial candidemia in ICU: A five-year Italian perspective.
    • Peacock JE, Wade JC, Lazarus HM, et al. Ciprofloxacin/piperacillin vs. tobramycin/piperacillin as empiric therapy for fever in neutropenic cancer patients, a randomized, double-blind trial [abstract 373]. In: Program and abstracts of the 37th Interscience Conference on Antimicrob Agents and Chemotherapy (Toronto). Washington, DC: American Society for Microbiology, 1997.

Citing an external website

This is based on a formal citation of a website in the references section of a research paper, but could also be used for in-line links that had added information. Here's the original:

[25] David Stern, "eprint Moderator Model", (version dated Jan 25, 1999)

<cite class="hcite">
<a class="fn url" href="">eprint Moderator Model</a>
<span class="author vcard">
<a href="" class="url fn">David Stern</a>
<abbr class="dtpublished" title="19990125T0000-0500">
    Jan 25, 1999

Discussion of Straw Format elements

This section is to provide explanations for posterity about the elements of the straw format, linking to discussions on the list and elsewhere if possible.

'hcite' as Root Element name

This discussion took place in January of 2007, with voting occurring on the mailing list.

It was decided to use 'hcite' as the root element's class-name for uniqueness and to reflect a trend in using 'h' to start microformat names.

The URI Element

It was decided to use URI for both http links to available copies or URNs. This encompasses URLs that link directly to online copies as well as through resolvers using URIs such as urn:isbn: 0521890012

See the discussion from November and December.

Date Fields

Brian's original straw format had three date fields, "accessed", "copyrighted", and "published". After examining the examples of usage on the web, it was clear that 'copyrighted' was not used in the examples we have. It was used once, but without a corresponding 'published' field (OCLC WorldCat), and it seems in that case to be used as an equivalent to 'published'.

I updated the straw citation to include only 'accessed' and 'published' on January 31. --Mike 00:26, 31 Jan 2007 (PST)

I've mentioned more than once that "date-published" is misleadingly specific; too much for real world citations. Consider that many books are published in the year preceding their copyright date, which is in fact the date used for citation. I'd prefer just "date" and "date-accessed" as a first cut. --Bruce 3 Feb 2007

See the discussion from the 'dates' thread on the list.


Discussion about how to represent containing relationships is on the thread 'nesting container elements'

Old straw format discussion

Saved here so that I'm not just deleting people's comments.

Mike straw format suggestion (Deprecated)

In the interests of starting debate and having something concrete to fix, I suggest the following structure for a format. It is probably very incomplete and I claim no microformat expertise. I'm just trying to follow existing patterns. Comments and ridicule are both solicited. -Mike

NOTE: This format is here for historical reference. Because it was not based on existing examples, I've deprecated it and contributed examples to Brian's format. If you feel that any missing elements in here should be in the final format, find examples for them and contribute to Brian's schema. Thanks! --Mike 18:22, 12 Oct 2006 (PDT)

In General

The citation format is based on a set of fields common to many bibliographic data formats, which are often implied by standard citation display styles but not explicitly marked up in practice on the web.


The citation schema consists of the following:

  • cite
    • title: required, text (class = fn)
    • subtitle: optional, text
    • authors: optional, use hCard
    • publication date: optional
    • link(s) to instantiations, optional, url or use rel-enclosure? (class=url)
    • UID, optional (for ISBN, DOI - use existing uid class) | permalink
    • series (aka volume/issuenum) , optional (not as sure how to handle these - suggestions?)
    • pages: startpage & endpage, optional, text
    • venue, optional (hCard)
    • publisher, optional (hCard)
    • container: optional (nested hCite)
    • abstract, optional (blockquote + class="abstract" ?)
    • notes, optional (blockquote + class="notes" ?)
    • keywords, optional (rel-tag)
    • image, optional (for inclusion inline, unlike the url)
    • copyright, optional (rel-license)
    • what else am I missing?
      • language, optional

Looks good, but I question the use of hCard for names. Due to ambiguity issues, requring hCard would lead to extra markup in order to apply just a name, hence the need for a root element. We should extract the N optimization of hCard like we did with adr, in order to ease this problem. --Ryan Cannon

Perhaps a Retrieved Date or Access Date would be appropriate for citing online resources. For example at you see citations like this:

Chief Academic Officers of the Big 12 Universities (2000). Big 12 Faculty Fellowship Program. Retrieved December 20, 2000 from the World Wide Web:

--Joe Andrieu

Discussion about citing legal cases

Here's some info I found about citing law:

I'm not a lawyer, so I'm relying on the published "blue book" standard, at least the only part of it I can get without paying $25. I'd be happy to hear improvements from experts in the field - how do lawyers mark up references to case law in HTML now?

From and, I find mostly just links to PDFs with the name of the case as the link text. Or just this, from EFF:

<h1>The Betamax Case</h1>
<h2>Sony Corp. of America v. Universal City Studios, 464 U.S. 417 (1984)</h2>

From an example at the sample bluepages: 5 basic components:

  • 1 name of the case (citation title)
  • 2 published source in which case may be found (citation containing publication?)
  • 3 a parenthetical indicating the court and year of decision (citation venue?)
  • 4 other parenthetical information, if any (citation notes?)
  • 5 subsequent history of the case, if any (citation notes?)

Here's two examples from the bluebook. Note that there are very strict rules about abbreviations in that source!

Holland v. Donnelly, 216 F. Supp. 2d 227, 230 (S.D.N.Y. 2002), aff'd, 324 F.3d 99 (2d Cir. 2003).

Green v. Georgia, 442 U.S. 95, 97 (1979) (per curiam) (holding that exclusion of relevant evidence at sentencing hearing constitutes denial of due process).

Examples in the wild

Pages which start to use the discussion above to create working examples in using hcite: (This section could be used as a base for a page like "hcite-examples-in-wild" later).

Please add new examples to the top of this section.

  • Example User Page at the regional computer lab Erlangen, Germany, based on the universal information system UnivIS marked up with vcard, hcalender (optional, if user makes a lecture) and hcite.