[uf-discuss] Canonical hCards (was: Search on CSS element)

Wed Jan 24 03:50:44 PST 2007

On 1/24/07, Brian Suda <brian.suda at gmail.com> wrote:
> On 1/24/07, David Janes <davidjanes at blogmatrix.com> wrote:
> > Do you (Tantek + all) agree with the following "architecture", or it
> > least think it's worth pursuing further:
> >
> > (a) hCards without additional markup; "url" is used to lookup a URL
> > (b) at the URL we can either find:
> > (b.i) the authorative hCard; OR
> > (b.ii) a pointer to authorative URL with the authorative hCard
> > (c) it's easy to find the authorative hCard on the authorative URL
> >
> > I'm sure we have the technology to to (b.ii), I just don't know if
> > anyone has done it. Anyone?
>
> In a similar vein, an hCard spider could find hCards in a page with a
> URL. They could then follow that URL to the person's page. Then
> inspect for hCards. If none are found, it could simply follow all
> rel-me links. Since rel-me is published by the author of the page,
> [it is a safe asssumption?] that the subsequent requested pages are
> also controled by the author. Then hCards could be looked for on those
> pages as well. The problem arrises when multiple hCards are
> encountered on a page - which is the authorative hCard? This issue is
> not a problem with the spider, but with the mechanism to say "THIS
> hCard is the one you want" (you suggested an anchor link #vcard), but
> using some hueristics, it might be possible to match the URL of the
> ORIGINAL hCard that started this spidering, and any hCards found in
> the rel-me crawl. If the URLs match, then you could (with some degree
> of certainly) collapse the values into a more robust hCard.

Note that the '#vcard' is from Ryan's website, which I was using as my
working example. I love the rel-me bit, I'm a little less happy with
the "crawl many pages and see if we find something" (if I understand
you correctly) and I wonder if it can be improved

Let me just write out the problem again, based on a real world example

(a) Start Source Page (e.g. http://microformats.org/)

<address class="author vcard">
 <a class="url fn" href="http://theryanking.com">Ryan</a>
</address>

(b) URL Page (http://theryanking.com):

... something happens ...

Note that Ryan already has a pointer on this page to his contact page:

<a href="http://theryanking.com/blog/contact/" title="contact">contact</a></li>

(c)
Authorative URL Page (http://theryanking.com/blog/contact/#vcard):

<div class="vcard" id="vcard">
  ... authorative hCard ...
</div>

So the issue is with the "something happens" bit. Here are a few suggestions

(I) Brian's solution (I think)

- look for "rel-me"
- check each page, matching on FN and/or URL

So we would change Brian's page:
<a href="http://theryanking.com/blog/contact/" title="contact"
rel="me">contact</a></li>

This issue here is FN on the source page may be a shorthand "Ryan" but
on the canonical page it may be

(II) Explicitly mark the authorative link with a unique class

<a href="http://theryanking.com/blog/contact/" rel="me
[hcard-authorative]">contact</a>

[hcard-authorative] is a placeholder, and obviously requires inventing
something new

(III) Modified Brian solution: require explicit ID for hcard

<a href="http://theryanking.com/blog/contact/#vcard" title="contact"
rel="me">contact</a></li>

That is, the spider will only attempt to look at rel-me URIs with a
fragment. The benefit is _explicitness_ and less open-ended work for
the spider. Requiring a fragment may introduce britleness?

Regards, etc...

-- 
David Janes
Founder, BlogMatrix
http://www.blogmatrix.com
http://blogmatrix.blogmatrix.com