citation-irc-notes-2006-04-09
Summary
A citation microformat needs to cover four uses:
- full, bibliographic citations, eg "The Title Of An Article. Smith J. Journal Title (1987). 46:1; 23-35." This is the main citation microformat and should contain all the information necessary to locate the item and create a text citation in all the common formats (MLA, APA, etc).
- minimal, inline citations in text, eg (Smith, 1987). These generally link to an item in the bibliography using a fragment identifier.
- full, inline citations in text, eg following a blockquote.
- description of the current item/page, including title, creator, date etc.
- I disagree with this last requirement (description of the current item/page).  A citation is IMHO a reference to a work *somewhere-else*.  It is a reference to a work from *another* work.  This is quite a different case than a work describing itself.  For info about a work in the work itself, see for example blog-description-examples. -Tantek
- the SELF description was discussed in the meet-up. The idea behind encoding all the attributes for the current page/item was to allow OTHERS to extract citation data from your page for their own use. Why should i re-key all the data about the article if i can simple convert the article to a citation itself! This should require no additional work or fields - their will be no properties unique to the SELF that are already not expressed in the base citation microformat. -brian
- I understand the desire to help automate this, but that means that it would be nice to have a transform from the self-description of a work, to a citation of the work.  That transform doesn't necessarily have to be the identity transform, hence the last requirement shouldn't be a requirement. -Tantek
- Your point that a citation is a reference to an outside source is right Tantek, but that doesn't mean that the description of self should be ANY different than that of an extraneous document, save for the fact that we call it a "citation." I think this may well have been why Ed bruoght up the idea of an hDC, where we end up thinking of hCite as more-or-less a sort of wrapper for other microformats (hCard, hCal, etc.). -Bruce
- But we are not talking about "should be ANY different".  We are talking about "should be SAME".  My point is that it should NOT be a *requirement* for *citations* per se.  The recognition that people will want to cite works is a good one, and thus when working on any "description" type format for a work describing itself, *that* work should look to the citation microformat and be sure to specify a transform, i.e. how to create a citation from a description. The easiest thing *might* be to simply embed a citation, but that is up to the description format, not up to the citation format, and thus should NOT be a requirement for the citation format.  My guess is that most description formats will be a superset of what a general purpose citation has in it.
- Consider the example of an article that contains within the page something along the lines of "Please cite this article as: The Title Of An Article. Smith J. Journal Title (1987). 46:1; 23-35.". That's both a citation and a description of the current page, and it should be marked up as such. --AlfEaton
- Alf, that's a theoretical example, so we should not consider it as a requirement.  If you can find 80/20 examples in the real world (published on the Web) that contain such a "suggested citation", please add those examples to citation-examples.
- Sigh, this is all sounding a bit patronizing Tantek. It's not a "theoretical example"; just Google for the phrase "please cite this article." But I'll add an example or two to the examples page. -- bruce
 
 
- Alf, that's a theoretical example, so we should not consider it as a requirement.  If you can find 80/20 examples in the real world (published on the Web) that contain such a "suggested citation", please add those examples to citation-examples.
 
- Consider the example of an article that contains within the page something along the lines of "Please cite this article as: The Title Of An Article. Smith J. Journal Title (1987). 46:1; 23-35.". That's both a citation and a description of the current page, and it should be marked up as such. --AlfEaton
 
- But we are not talking about "should be ANY different".  We are talking about "should be SAME".  My point is that it should NOT be a *requirement* for *citations* per se.  The recognition that people will want to cite works is a good one, and thus when working on any "description" type format for a work describing itself, *that* work should look to the citation microformat and be sure to specify a transform, i.e. how to create a citation from a description. The easiest thing *might* be to simply embed a citation, but that is up to the description format, not up to the citation format, and thus should NOT be a requirement for the citation format.  My guess is that most description formats will be a superset of what a general purpose citation has in it.
 
- Your point that a citation is a reference to an outside source is right Tantek, but that doesn't mean that the description of self should be ANY different than that of an extraneous document, save for the fact that we call it a "citation." I think this may well have been why Ed bruoght up the idea of an hDC, where we end up thinking of hCite as more-or-less a sort of wrapper for other microformats (hCard, hCal, etc.). -Bruce
 
- I understand the desire to help automate this, but that means that it would be nice to have a transform from the self-description of a work, to a citation of the work.  That transform doesn't necessarily have to be the identity transform, hence the last requirement shouldn't be a requirement. -Tantek
 
- the SELF description was discussed in the meet-up. The idea behind encoding all the attributes for the current page/item was to allow OTHERS to extract citation data from your page for their own use. Why should i re-key all the data about the article if i can simple convert the article to a citation itself! This should require no additional work or fields - their will be no properties unique to the SELF that are already not expressed in the base citation microformat. -brian
- In addition, naming a document "-recommendation" at this point is quite premature, especially when there is much work to be done in both doing the work and cleaning up of citation-examples, citation-formats, and citation-brainstorming. If this is meant to be a record of notes or a summary of a discussion, then it should be a notes page, e.g. see for example geo-bof-2005-06-30.  Thanks,  -Tantek
- I will write-up the notes for the irc-meeting shortly and post them to the appropriate place. In the mean-time the general consenus was that we have closed on the exploration phase of the citation microformat. We have solid goals and are moving to implementation. Admittably the citation pages need some serious clean-up, but there is no need to halt progress in the development/itteration phase for those who are helping to move forward the format. -brian
- Considering any "exploration" "closed" regarding a microformat makes no sense at all when the work to complete the necessary pages per the process has yet to happen. I'm moving this page to a name more reflecting of its consideration within the process, just notes. -Tantek
 
 
- I will write-up the notes for the irc-meeting shortly and post them to the appropriate place. In the mean-time the general consenus was that we have closed on the exploration phase of the citation microformat. We have solid goals and are moving to implementation. Admittably the citation pages need some serious clean-up, but there is no need to halt progress in the development/itteration phase for those who are helping to move forward the format. -brian
Notes from the Meetup
NOTES for the 2006-04-09 IRC Citation meetup
ATTENDEES:
- briansuda
- bretonslivka
- darcusb
- dchud
- edsu
- fresco
- rsinger
Discussion Topics:
What is a citation?
Before launching into building a citation microformat, everyone should be on the same page as to what we are trying to define. So our working definition of a citation is:
citation: a short description that points to a fuller description elsewhere, either in a note or a reference list
reference item: a fuller description; also called a bibliographic entry or item
bibliography: a collection of citations
see: http://wiki.services.openoffice.org/wiki/Bibliographic_Project%27s_Developer_Page#Terminology
Scope of this microformat
3 Areas discussed:
1. citation information about the current page
2. citation information about a cited reference
3. inline citations, a form of #2
(these are 3 views of the same data)
Reuse of existing microformats, hCard for authors and publishers, hCalendar for dates and times.
Not a flat model.
Formats discussed:
- DC
- Not strong enough by itself
- Most basic properties
 
- DC+DCQ
- can be used for the locators (pages, volumes, etc.).
- The primary things from DCQ is the date stuff and the crucial isPartOf relation
 
- OpenURL
- openurl uses community profiles, each with their own schema some of which borrow vocabulary from each other, each their own standalone document
- Openurl is good for getting granularity, whether or not openurl is used or not isn't all that important as long as the microformat creates a model that's 'compatible' with openurl
 
- OpenDocument
- OpenDocument, there has been talk of adding an extensible metadata system based around modules, with default ones for DC and DCQ.
 
- PRISM
- One problem with PRISM is the that the "number" property is problematic. It should distinguish issue and document numbers
 
- MODS
Other ideas:
The core set of properties should account for the following basic structures:
- References
- Collections
- Events (this can be handled by hCalendar)
- Agents (this can be handled by hCard)
Collection includes subclasses like Periodical (and in turn Journal and such) and Series. http://purl.org/net/biblio
Conclusion:
Start a draft model for everyone to look into, another straw proposal
Current problems
- Bibliographies published in HTML generally just use plain text (a URL is often included), but are sometimes produced from fully marked-up data, which is lost.
- inline citations often link to bibliography items, but use named anchors rather than fragment identifiers.
- There are multiple ways of adding self-descriptive data to web pages, such as meta tags -- with or without Dublin Core -- or embedded RDF.
- meta tags are nearly useless since their content is invisible. -Tantek
- I think there is some agreement that there may be some elements in a citation that are useful and probably should be included in the spec that are best left non-visible. 'Machine readable' is not nearly useless. The problem with metatags isn't that they're useless, it's that they are hard to relate to specific content in a web page. --ross
 
Straw Proposals
These straw thoughts/proposals should be on the citation-brainstorming page, not here, along with some citation (so to speak ;) of the proposer. -Tantek
Microformat for inline citations
<cite> <a href="#ref-1">1</a> </cite>
<cite> <a href="#ref-1">Smith, 2002</a> </cite>
Microformat for a generic bibliography citation
<li class="citation" id="ref-1"> <span class="title"> <a class="url" href="http://dx.doi.org/[DOI]">[item title]</a> </span> <span class="creator vcard"> <span class="n"> <span class="family-name">[surname]</span>, <abbr title="[given-name]" class="given-name">[initial]</abbr> </span> </span> <span class="creator vcard"> <span class="n"> <span class="family-name">[surname]</span>, <abbr title="[given-name]" class="given-name">[initial]</abbr> </span> </span> <abbr class="date-published" title="YYYY-MM-DDTHH:MM:SS+ZZ:ZZ">[year]</abbr> </li>
Note: for an full inline citation, the
<li class="citation" id=""></li>
would be replaced by
<cite></cite>
and there would not be a link to a local fragment.
Note: for a self citation, the
<li class="citation" id=""></li>
would be replaced by
<span|div class="citation self"></span|div>
Additional elements for a journal article citation
class="citation article" <span class="container"> <span class="title"> <a class="url" href="http://dx.doi.org/[doi]">[journal title]</a> </span> <abbr class="date-published" title="YYYY-MM-DDTHH:MM:SS+ZZ:ZZ">[year]</abbr> <span class="volume">[volume no.]</span> <span class="issue">[issue no.]</span> <abbr class="uri" title="urn:issn/[issn]"/> </span> <span class="pages">[start-page]-[end-page]</span> <abbr class="uri" title="info:pmid/[PMID]"/>
- I changed "number" to "issue". I also think that all of that content ought likely be moved out of the "container" wrapper into the root level. Finally, should not the container include another type class ("periodical" or "journal")? -- bruce
- I thought you wanted the model not to be flat, ie the container should have its own attributes? --alf
- I am not referring to things like titles, but rather only to the locator information (volume, issue, pages), and this is primarily for practical reasons. Those in fact are not characteristics of the container (periodical), but rather of the relationship between the article and issue, and then the issue and the periodical. E.g. to model it "properly" would suggest three levels (root, container/issue, collection/periodical), each with their own respective locators. So I was just thinking for those reasons to in general say these locators ought to be associated with the root. What do you think? -- bruce. Why not do three levels? -- alf.
- It might be useful to define which elements would appear at each level.  If there are "universals" that would appear in any of kind of citation (although might have a different label/connotation based on context), what would they be?  Title?  Creator?  What else?  Then we can start figuring out the special elements for each "type" of citation. --ross
- Titles is the obvious one that applies across the board: chapters, books, photographs, legal cases, court reporters, webpages, series, etc. all have titles. Likewise, they all have different kinds of contributors (including in many cases creators). But that's it I think. -- bruce
- See the generic format above: title, creator and date are pretty universal. --alf
 
 
- It might be useful to define which elements would appear at each level.  If there are "universals" that would appear in any of kind of citation (although might have a different label/connotation based on context), what would they be?  Title?  Creator?  What else?  Then we can start figuring out the special elements for each "type" of citation. --ross
 
- I am not referring to things like titles, but rather only to the locator information (volume, issue, pages), and this is primarily for practical reasons. Those in fact are not characteristics of the container (periodical), but rather of the relationship between the article and issue, and then the issue and the periodical. E.g. to model it "properly" would suggest three levels (root, container/issue, collection/periodical), each with their own respective locators. So I was just thinking for those reasons to in general say these locators ought to be associated with the root. What do you think? -- bruce. Why not do three levels? -- alf.
 
- I thought you wanted the model not to be flat, ie the container should have its own attributes? --alf
Additional elements for a book citation
class="citation book" <span class="container"> <span class="title"> <a class="url" href="http://dx.doi.org/[doi]">[book title]</a> </span> <span class="subtitle">[book subtitle]</span> <span class="publisher vcard">[publisher]</span> <span class="editor vcard">[editor]</span> <abbr class="date-published" title="YYYY-MM-DDTHH:MM:SS+ZZ:ZZ">[year]</abbr> <abbr class="uri" title="urn:isbn/[isbn]"/> </span> <span class="pages">[start-page]-[end-page]</span>
BDarcus: this looks good. I wonder, though, about two issues.
1. why not just title and abbreviatedTitle instead of title and subtitle?
2. date-published with books in particular is a can of worms. My book, for example, has a copyright date of 2006 (which is what one would include in the citation) but an actual publication date sometime in late-2005. The specificity of date-published is thus misleading. What we're really talking about is a copyright date. I wonder if the above might not be better with two classes: date and copyright? So then these classes of dates: date, copyright, issued (more generic than published). That slso fits dc and qualified dc.
- This seems complicated (of course, it's all complicated) because there's an assumption that the citation creator will actually know which of these the date actually is.  It seems like a generic date field that can be qualified with "published" or "copyright" or "actual year the conference happened", etc. would be more user-friendly.
- Ross, yes, that's what I had in mind (but didn't explain very well). A generic "date" class ought to work for most cases, but leave room to add further qualfiers if necessary. -- bruce