[uf-discuss] hCite elevator pitch and my bibliography generator

Thu Mar 22 13:08:54 PST 2007

(Sorry about my frustrated tone. I always get frustrated when I try  
to extract implementation directions from the wiki and fail. This  
isn't the first time. And I can read specs in general.)

On Mar 10, 2007, at 23:10, Paul Wilkins wrote:

> Henri Sivonen wrote:
>
>> I needed a .bib-based bibliography generator for XHTML, so I  
>> wrote  one with help from a friend who had developed a .bib  
>> parser. The  output of my generator can be seen at
>> http://hsivonen.iki.fi/thesis/html5-conformance- 
>> checker.xhtml#references
>>
>> I've wrapped the values of .bib fields in elements whose class  
>> name  is the .bib field name. I did it just in case. I don't have  
>> any  consumer use case for those class names. It was just super- 
>> easy to  generate them.
>>
>> My use case (publishing an academic paper with a bibliography) is  
>> not  mentioned as a use case at
>> http://microformats.org/wiki/citation-brainstorming . More to the   
>> point, the wiki has no consumer use case for my publication use case.
>>
>> Does this mean that hCite is not for me at all?
>
> Not at all. You are using the BibTex format, which is covered in  
> the citation formats http://microformats.org/wiki/citation-formats

Sure, but considering that I share my .bib, should I expect people to  
want to scrape my (X)HTML-formatted bibliography?

>> If hCite is for me, what's the elevator pitch convincing me to  
>> put  more effort into my generator? What benefits should I expect  
>> if I do?  Is hCite mature enough to be implemented yet?
>
> The citation microformat is a work in progress at this stage, so  
> it's not mature enough for programs to extract information from it,

I guess this means that I shouldn't try to support hCite on the  
generator side in my thesis considering that the document should go  
final on the first week of April.

Would it be of any use to anyone if I wrapped the name of each author/ 
editor in a <span class='fn'> if I otherwise leave my markup the way  
it is now?

> The benefits are that people visitng your content with next  
> generation tools wil be able to easily extract and use the  
> information in more interesting and useful ways.

So basically, my effort would not be about catering to specific  
realistic foreseeable use cases. Instead, it would be about putting  
data out there in case someone figures out a use case later on.

> Tantek has a recent presentation about the big picture of  
> microformats at http://tantek.com/presentations/2007/02/microformats/

I think I know the base theory. I am interested in practical use  
cases and implementability in this particular case.

>> Moreover, is it even possible to generate hCite from my source  
>> data  (http://hsivonen.iki.fi/thesis/dippa.bib) without  
>> sacrificing the  presentation that I want and without potentially  
>> generating bogus  markup for personal names?
>
> One of the big ideas behind the use of microformats is that it will  
> allow you to markup the content on your page without modifying the  
> presentation of it.

Somehow, I was under the impression that hCite required bibliography  
items as <li>s instead of <dt>/<dd> pairs (which is what I use and  
what W3C and WHATWG specs use).

>> For example, my source data does not  encode explicitly the given  
>> name, the family name and other stuff  that isn't quite neither.  
>> As far as I can tell, it is impossible to  tell heuristically that  
>> the middle token in these two names is  semantically different:
>> Gavin Thomas Nicol
>> Henrik Frystyk Nielsen
>
> Those issues haven't yet been covered for for the citation  
> microformat.

What I'm trying to say is that I think hCite should allow names to be  
marked up as formatted names tossing the deformatting problem to the  
consumer. After all, one of the most popular bibliography data  
format, BibTeX, stores formatted names.

> It may be possible for for a generator to parse through them and  
> extract the appropriate information though.
> For example, honorific-prefix and honorific-suffix are a rather  
> short list. Then after those, the given name, family name and  
> additional name could be extracted in that particular order.

Using heuristics in the generator to make explicit metadata  
statements is generally a bad idea. If the result is wrong, it still  
pretends to be authoritative. If heuristics are involved, the input  
to the heuristic should be sent and consumers should be able to  
compete on how good their heuristics are.

-- 
Henri Sivonen
hsivonen at iki.fi
http://hsivonen.iki.fi/