[uf-discuss] [citation] url field
michael.mccracken at gmail.com
Mon Dec 4 13:48:04 PST 2006
On 12/2/06, Mike Schinkel <mikeschinkel at gmail.com> wrote:
> A couple points on this subject. I have recently been doing a *lot* of
> research in the area of URLs/URIs and having discussions with numerous
> people on REST-discuss and www-TAG lists so I feel I'm pretty well-versed on
> this subject now.
> Although it is possible to infer an ISBN or maybe even a DOI from a URL, it
> is considered "Bad Practice" unless the "URI Authority" (i.e. owner of the
> website) specifically documented the structure of the URL and gave a
> reasonably trustworthy guarantee that it will not change.
> 1.) "Architecture of the World Wide Web, Volume One" section 2.5 on "URI
> Opacity" :
> Good practice: URI opacity
> Agents making use of URIs SHOULD NOT attempt to infer properties of
> the referenced resource.
> 2.) "The use of Metadata in URIs" section 2.1 on "Reliability of URI
> metadata" 
> Constraint: Web software MUST NOT depend on the correctness of
> inferred from a URI, except when the encoding of such metadata is
> by applicable standards and specifications.
> 3.) "The use of Metadata in URIs" section 2.1 on "Reliability of URI
> metadata" 
> The principle conclusions of this finding are:
> * Assignment authorities may publish specifications detailing the
> structure and
> semantics of the URIs they assign. Other users of those URIs may use
> specifications to infer information about resources identified by
> URI assigned by
> that authority.
> * People and software using URIs assigned outside of their own
> authority should
> make as few inferences as possible about a resource based on its
> URI. The more
> dependencies a piece of software has on particular constraints and
> the more fragile it becomes to change and the lower its generic
> In the case of Jon Udel's LibraryLookup which as been referenced as an
> Data point: ISBNs are already being reliably extracted from URLs:
> Jon's work has been derided by purists as doing something it shouldn't i.e.
> "peeking" into URLs when they should remain opaque. Personally, I don't see
> what Jon did as such a bad thing. Jon's script interfaces with a human only,
> and if Amazon ever changes their URLs his script just won't work and the
> user will figure that out. In the mean time by breaking the rules he's
> offering pretty useful functionality that he couldn't get otherwise. And
> even Amazon does changes their URLs and his script breaks, which is highly
> unlikely given their affiliate program, Jon can just update his script and
> then anyone who has a broken script can search for Jon's new version (unless
> Amazon eliminates the ISBN from the URL, which I would highly doubt.)
> However, advocating the use of non-document metadata in a URL for a
> Microformat citation is a completely different matter. Rather than one
> author (Jon Udell) using it for one app (LibraryLookup) where it's users can
> later get updates if required, advocating it for a Microformat where authors
> will markup untold HTML content, much of which will never get updated for
> future revisions requires a very high bar for immutability. IOW, we should
> ensure that we have a *guarantee* that the format of the URL will *never* or
> we shouldn't use it. Yes we *could* still parse the old format, but we'd
> have to continue adding parsers some of which might eventually fail for
> At the moment, the only immutable reference for an ISBN is a URN from RFC
> 3187. For example:
> This doesn't deference in a browser, if used in IE7 for example, but one day
> it might. But we can be sure it is definitely immutable.
> As for resolving DOIs, they are new to me and I've not done enough research
> to determine if there is an immutable resolvable source for DOIs. This
> article and these websites ( & ) might be helpful there.
> As an aside, please don't take this as me being unsupportive. On the
> contrary, I am a strong advocate to get website owners to put metadata in
> their URLs and to document that metadata. However, until we have solid
> sources of URLs with documented metadata, we should probably all play
> smartly by the rules as specified by the W3C, at least IMO.
> -Mike Schinkel
>  http://www.w3.org/TR/webarch/#uri-opacity
>  http://www.w3.org/2001/tag/doc/metaDataInURI-31-20061107.html
>  http://www.w3.org/2001/tag/doc/metaDataInURI-31-20061107.html#N1023D
>  http://www.ietf.org/rfc/rfc3187.txt
>  http://www.dlib.org/dlib/june98/06powell.html
>  http://www.handle.net/
>  http://www.doi.org/
Mike, thanks for all the detail. I definitely learned some things.
In the context of my original proposal to add a "URL" field to the
microformat, I now feel like I need to separate that proposal from one
of the statements I made in it:
"I also suggest that in the case of identifiers like a DOI or ISBN
which can be represented as a parameter in a link to doi.org or some
other resolver, that the format encourage using a URL field for those
identifiers and not include separate fields for each such identifier.
In other words, I think that class="url uid" is sufficient to encode
DOI/ISBN/etc., and we shouldn't add a separate DOI class, a separate
ISBN class, and so on.
To be clear - I still think that *if* it is possible to mark up a DOI
or ISBN as a link without obscuring the DOI, then that's a positive
thing. It sounds like it's just more complicated than I thought to do
that. So maybe the format doesn't need to mention those in connection
with the URL field.
I do think that a URL field (class="url") should be included, to
represent a link to a copy of the cited work, and if we want to mark
up one or more identifiers, we can use a separate class (I suggest
"uid") to do so. If we're lucky and there's a good way to merge them,
then use class="url uid".
I'd like to get feedback on whether or not the list likes the idea of
a URL field as outlined above - separate from the issue of URNs and
The use case I'm focused on is here:
UCSD CSE PhD Candidate
More information about the microformats-discuss