[uf-discuss] hCite elevator pitch and my bibliography generator

Paul Wilkins paul_wilkins at xtra.co.nz
Fri Mar 23 04:22:40 PST 2007


Henri Sivonen wrote:
> On Mar 10, 2007, at 23:10, Paul Wilkins wrote:
>> You are using the BibTex format, which is covered in  the 
>> citation formats http://microformats.org/wiki/citation-formats
> 
> Sure, but considering that I share my .bib, should I expect people to  
> want to scrape my (X)HTML-formatted bibliography?

If the .bib is used as the lone source for the XHTML, I suspect it would 
be easier to scrape the .bib file.

>> The citation microformat is a work in progress at this stage, so  it's 
>> not mature enough for programs to extract information from it,
> 
> I guess this means that I shouldn't try to support hCite on the  
> generator side in my thesis considering that the document should go  
> final on the first week of April.

Even though it goes final then, does that prevent you from later on 
adding markup which doesn't affect the text, yet makes it easier for 
tools to scrape through the information?

> Would it be of any use to anyone if I wrapped the name of each author/ 
> editor in a <span class='fn'> if I otherwise leave my markup the way  it 
> is now?

A formatted name is quite a restricted format, and if the formatted name 
doesn't follow a certain prescribed format, it is considered to be 
invalid and isn't used.

Currently the BibTeX is as follows

@Misc{AXML,
   editor = {Tim Bray and Jean Paoli and C.M. Sperberg-McQueen},
   title = {The Annotated XML 1.0 Specification},
   year = 	 {1998},
   publisher = {O'Reilly Media, Inc.},
   refdate = {2007-03-04},
   url = {http://www.xml.com/pub/a/axml/axmlintro.html}
}

 From which you are wanting to create the following kind of data.

[AXML]
     The Annotated XML 1.0 Specification. Tim Bray, Jean Paoli and C.M. 
Sperberg-McQueen, editors. O’Reilly Media, Inc., 1998. 
http://www.xml.com/pub/a/axml/axmlintro.html (referenced: 2007-03-04)

The editor section alone will be interesting to markup, because the 
citation will have to allow multiple editors, in which case both the 
BibTeX and the microformat will have to be created from a parent source, 
so that the microformat can gain the name-based information in the 
format required, while still allowing that information through to become 
the BibTeX file.

>> The benefits are that people visitng your content with next  
>> generation tools wil be able to easily extract and use the  
>> information in more interesting and useful ways.
> 
> So basically, my effort would not be about catering to specific  
> realistic foreseeable use cases. Instead, it would be about putting  
> data out there in case someone figures out a use case later on.

It may be more useful to provide the ISBN number for the book. Then the 
problems left to be solved become smaller and easier to handle.

> Somehow, I was under the impression that hCite required bibliography  
> items as <li>s instead of <dt>/<dd> pairs (which is what I use and  what 
> W3C and WHATWG specs use).

I'm sure that design patterns can be created to accommodate such a scheme.

> What I'm trying to say is that I think hCite should allow names to be  
> marked up as formatted names tossing the deformatting problem to the  
> consumer. After all, one of the most popular bibliography data  format, 
> BibTeX, stores formatted names.

Currently the formatted names are accepted in the following formats

given-name (space) family-name
family-name (comma) given-name
family-name (comma) given-name-first-initial
family-name (space) given-name-first-initial (optional period)

How much granularity does BibTeX allow for when storing the formated 
names for Editors?


-- 
Paul Wilkins


More information about the microformats-discuss mailing list