[uf-discuss] Citation: experiences parsing DC metadata

Michael McCracken michael.mccracken at gmail.com
Tue Aug 29 14:49:39 PDT 2006


Hi, from recent list discussion I discovered that archives using
EPrints.org had Dublin Core metadata in meta tags on individual item
pages. I posted those examples to the wiki, and then implemented
support for importing that metadata into BibDesk. See this post for
details and a movie, once it uploads:
http://michael-mccracken.net/wp/?p=63

I'm pretty happy with the feature. But I still want a microformat for
inline citations.

I'm goning to share a couple of quick impressions from getting
something working with Dublin Core terms. What I did was translate the
available terms into BibTeX, which in this case is what is necessary
to make it useful for someone who wants to cite the item in a new
paper.

1. DC terms for creator and contributor don't map clearly to author
and editor, at least not according to the DC standadrd. Maybe there's
some convention about how it's usually done, but I couldn't find it.
IMO, 'author' and 'editor' make more sense, and that's probably
because that's how everything else I'm familiar with does it (except
for the super-thorough MARC relator scheme [marcrel], which I don't
think is author-friendly).

2. The DC type term, even though it was a little vague, was very
useful for my purposes - any serious citation database is going to
want to distinguish types, and it makes it a lot easier to import in a
usable manner. If it doesn't have a type, I default to the BibTeX
"misc" type, and that's an acceptable fallback. I think a good
microformat should allow specifying the type of an entry, but should
not require it. There are situations where the type won't be
available, and since there is probably a reasonable generic fallback
for any consumer to use, some data is better than getting none because
the format requires a type.

If this sounds right, it means we need to discuss what a reasonable
choice for allowable citation types is.

The BibTeX format allows any type, and leaves it to the consumer (a
style file) to figure out what to do with types it doesn't understand
(commonly just ignored).

It looks like Dublin Core also allows arbitrary values for the type
attribute but recommends you choose from the DC type vocabulary
[dctv], which has very little overlap with BibTeX's types - the DCTV
covers things BibTeX doesn't like Event, Dataset, and MovingImage,
while most of the standard BibTeX types would be classified simply as
"Text". The Eprints sites I looked at went their own way, and I just
tried to cover everything they let you search for.

What does everyone think about types?

Thanks,
-mike

[dctv]:http://dublincore.org/documents/dcmi-type-vocabulary/
[marcrel]:http://www.loc.gov/marc/sourcecode/relator/relatorlist.html

-- 
Michael McCracken
UCSD CSE PhD Candidate
research: http://www.cse.ucsd.edu/~mmccrack/
misc: http://michael-mccracken.net/wp/


More information about the microformats-discuss mailing list