[uf-discuss] [citation] url field

Michael McCracken michael.mccracken at gmail.com
Thu Dec 7 15:05:01 PST 2006


This seems to have been buried - so again, to anyone interested in hCite:

I want to define a new field "URL" to denote an http URL that points
to the location of a copy of the cited work.

URIs that encode an identifier of the work can be combined with this
field, but do not need to be.

I understand that the name "URL" may overlap a bit with URI, and
something like "downloadlink", etc. might be more direct, but I argue
that "URL" is the better choice because it is the most common name
already in use in our examples from the web.

Can we discuss this revised version of the proposal (or just vote on it?)

Thanks,
-mike

On 12/4/06, Michael McCracken <michael.mccracken at gmail.com> wrote:
> On 12/2/06, Mike Schinkel <mikeschinkel at gmail.com> wrote:
> > A couple points on this subject. I have recently been doing a *lot* of
> > research in the area of URLs/URIs and having discussions with numerous
> > people on REST-discuss and www-TAG lists so I feel I'm pretty well-versed on
> > this subject now.
> >
> > Although it is possible to infer an ISBN or maybe even a DOI from a URL, it
> > is considered "Bad Practice" unless the "URI Authority" (i.e. owner of the
> > website) specifically documented the structure of the URL and gave a
> > reasonably trustworthy guarantee that it will not change.
> >
> > References:
> >
> > 1.) "Architecture of the World Wide Web, Volume One" section 2.5 on "URI
> > Opacity" [1]:
> >
> >         Good practice: URI opacity
> >         Agents making use of URIs SHOULD NOT attempt to infer properties of
> > the referenced resource.
> >
> > 2.) "The use of Metadata in URIs" section 2.1 on "Reliability of URI
> > metadata" [2]
> >
> >         Constraint: Web software MUST NOT depend on the correctness of
> > metadata
> >         inferred from a URI, except when the encoding of such metadata is
> > documented
> >         by applicable standards and specifications.
> >
> > 3.) "The use of Metadata in URIs" section 2.1 on "Reliability of URI
> > metadata" [2]
> >
> >         The principle conclusions of this finding are:
> >
> >         * Assignment authorities may publish specifications detailing the
> > structure and
> >         semantics of the URIs they assign. Other users of those URIs may use
> > such
> >         specifications to infer information about resources identified by
> > URI assigned by
> >         that authority.
> >
> >         * People and software using URIs assigned outside of their own
> > authority should
> >         make as few inferences as possible about a resource based on its
> > URI. The more
> >         dependencies a piece of software has on particular constraints and
> > inferences,
> >         the more fragile it becomes to change and the lower its generic
> > utility.
> >
> > In the case of Jon Udel's LibraryLookup which as been referenced as an
> > example:
> >
> >         Data point: ISBNs are already being reliably extracted from URLs:
> >
> > http://weblog.infoworld.com/udell/stories/2002/12/11/librarylookup.html
> >
> > Jon's work has been derided by purists as doing something it shouldn't i.e.
> > "peeking" into URLs when they should remain opaque. Personally, I don't see
> > what Jon did as such a bad thing. Jon's script interfaces with a human only,
> > and if Amazon ever changes their URLs his script just won't work and the
> > user will figure that out. In the mean time by breaking the rules he's
> > offering pretty useful functionality that he couldn't get otherwise.  And
> > even Amazon does changes their URLs and his script breaks, which is highly
> > unlikely given their affiliate program, Jon can just update his script and
> > then anyone who has a broken script can search for Jon's new version (unless
> > Amazon eliminates the ISBN from the URL, which I would highly doubt.)
> >
> > However, advocating the use of non-document metadata in a URL for a
> > Microformat citation is a completely different matter. Rather than one
> > author (Jon Udell) using it for one app (LibraryLookup) where it's users can
> > later get updates if required, advocating it for a Microformat where authors
> > will markup untold HTML content, much of which will never get updated for
> > future revisions requires a very high bar for immutability. IOW, we should
> > ensure that we have a *guarantee* that the format of the URL will *never* or
> > we shouldn't use it. Yes we *could* still parse the old format, but we'd
> > have to continue adding parsers some of which might eventually fail for
> > ambiguity.
> >
> > At the moment, the only immutable reference for an ISBN is a URN from RFC
> > 3187[4]. For example:
> >
> >         URN:ISBN:0-395-36341-1
> >
> > This doesn't deference in a browser, if used in IE7 for example, but one day
> > it might. But we can be sure it is definitely immutable.
> >
> > As for resolving DOIs, they are new to me and I've not done enough research
> > to determine if there is an immutable resolvable source for DOIs.  This
> > article[5] and these websites ([6] & [7]) might be helpful there.
> >
> > As an aside, please don't take this as me being unsupportive.  On the
> > contrary, I am a strong advocate to get website owners to put metadata in
> > their URLs and to document that metadata. However, until we have solid
> > sources of URLs with documented metadata, we should probably all play
> > smartly by the rules as specified by the W3C, at least IMO.
> >
> > -Mike Schinkel
> > http://www.mikeschinkel.com/blogs/
> > http://www.welldesignedurls.org/
> >
> > [1] http://www.w3.org/TR/webarch/#uri-opacity
> > [2] http://www.w3.org/2001/tag/doc/metaDataInURI-31-20061107.html
> > [3] http://www.w3.org/2001/tag/doc/metaDataInURI-31-20061107.html#N1023D
> > [4] http://www.ietf.org/rfc/rfc3187.txt
> > [5] http://www.dlib.org/dlib/june98/06powell.html
> > [6] http://www.handle.net/
> > [7] http://www.doi.org/
> >
>
> Mike, thanks for all the detail. I definitely learned some things.
>
> In the context of my original proposal to add a "URL" field to the
> microformat, I now feel like I need to separate that proposal from one
> of the statements I made in it:
>
> "I also suggest that in the case of identifiers like a DOI or ISBN
> which can be represented as a parameter in a link to doi.org or some
> other resolver, that the format encourage using a URL field for those
> identifiers and not include separate fields for each such identifier.
> In other words, I think that class="url uid"  is sufficient to encode
> DOI/ISBN/etc., and we shouldn't add a separate DOI class, a separate
> ISBN class, and so on.
> "
>
> To be clear - I still think that *if* it is possible to mark up a DOI
> or ISBN as a link without obscuring the DOI, then that's a positive
> thing. It sounds like it's just more complicated than I thought to do
> that. So maybe the format doesn't need to mention those in connection
> with the URL field.
>
> I do think that a URL field (class="url") should be included, to
> represent a link to a copy of the cited work, and if we want to mark
> up one or more identifiers, we can use a separate class (I suggest
> "uid") to do so. If we're lucky and there's a good way to merge them,
> then use class="url uid".
>
> I'd like to get feedback on whether or not the list likes the idea of
> a URL field as outlined above - separate from the issue of URNs and
> metadata recovery.
>
> The use case I'm focused on is here:
> http://microformats.org/wiki/citation-brainstorming#Acquiring_reference_information_from_the_web
>
>
> Thanks,
> -mike
>
> --
> Michael McCracken
> UCSD CSE PhD Candidate
> research: http://www.cse.ucsd.edu/~mmccrack/
> misc: http://michael-mccracken.net/wp/
>


-- 
Michael McCracken
UCSD CSE PhD Candidate
research: http://www.cse.ucsd.edu/~mmccrack/
misc: http://michael-mccracken.net/wp/


More information about the microformats-discuss mailing list