[microformats-discuss] (url) canonicalization

Tantek Ç elik tantek at cs.stanford.edu
Mon Sep 26 20:01:43 PDT 2005


On 9/26/05 1:47 PM, "Ryan King" <ryan at technorati.com> wrote:

> On Sep 26, 2005, at 1:28 PM, Dr. Ernie Prabhakar wrote:
>> On Sep 26, 2005, at 1:19 PM, David Janes -- BlogMatrix wrote:
>> 
>>> This is pretty tied up in the issue of creating a weblog
>>> microformat. In  particular, creating (if necessary and if
>>> possible) an equivalence between RSS/Atom/Feed entries and entries
>>> within the HTML wevkig, which of course can be summaries/teasers,
>>> complete, on the main page, or in multiple different archives, and
>>> so forth.
>> 
>> Actually, I'm thinking this is a very common design pattern
>> (nanoformat ?).
> 
> micro isn't small enough? :D

Heh, exactly.

And design patterns aren't necessarily formats.  We have a date-time design
pattern for this express reason.


>>   I can think of a whole bunch of cases where I'd like to be able
>> to specify a list of items, one of which is preferred:
>> 
>> * feeds
>> * thumbnails
>> * email addresses
>> * multi-res media downloads
>> * translations
>> * formats (HTML vs. PDF)

Note that preferred != canonical.

A canonical URL's only purpose is to pick which out of a set of URLs which
return the same resource should be the definitive URL, like Ryan's live
journal example.

Canonical is not a way of expressing which out of a set of *different*
resources is preferred. That's a different semantic.


> I think we're going to have to deal with this in our media-metadata
> work.

Possibly.  I'm not sure it passes the 80/20 rule, and thus I would postpone
it from v1.


> However, what I'm thinking of is somewhat different. I'm just
> thinking there needs to be a simple way for one to say "that page is
> like this page, and, in fact, its the main one."
> 
> so something like this in the head:
> 
> <link rel="alternate bookmark" href="..." />
> 
> should be an expression of those semantics.
> 
> It almost sounds to me like we need to take the "bookmark" usage,
> which is IFAIK not standardized anywhere...... (Ryan goes to actually
> do the research)......
> 
> I take that back: http://www.w3.org/TR/REC-html40/types.html#type-links
> 
> I really didn't realize that rel="bookmark" was in html. Here's the
> definition:
> 
> Bookmark
>    Refers to a bookmark. A bookmark is a link to a key entry point
> within an extended document. The title >attribute may be used, for
> example, to label the bookmark. Note that several bookmarks may be
> defined in >each document.
> 
> This doesn't seem like what we're looking for here.

That's right.  "bookmark" does not mean canonical.

But note that "bookmark" does have effectively the same semantics as
"permalink", thus it can be used to represent that semantic.

 http://tantek.com/log/2002/11.html#L20021128t1352


>> I think the most common HTML implementation of this is an ordered
>> list, where the first choice is preferred, i.e.:
>> 
>> <ol class='pickone'>
>> <li>I am the greatest</li> <-- Default, aka Canonical -->
>> <li>We're #2</li>
>> <li>Who cares?</li>
>> </ol>

Really - I can't say I've seen this, but perhaps I'm just not looking in the
right places.

Ernie, could you provide one or more URLs where you've seen this pattern?

It certainly would be a reasonable use of XOXO.


> I think this is a different use case that what I'm talking about.
> (which still needs to be addressed, but probably separately).
> 
>> Of course, that's only useful is there's additional attributes that
>> allow for meaningful 'picks', e.g., size, language, etc.
>> 
>> Is there in fact a general way to handle this?  Are these *always*
>> links, so we can use 'rel'?

In the case of canonicalizing URLs, yes, since they are URLs, we can always
use rel="canonical".

Thanks,

Tantek



More information about the microformats-discuss mailing list