Identity consolidation is the ability for a user to indicate that one or more identities, profiles, URLs across different sites all represent that same user. Also known as: profile aggregation, profile equivalency.
user centric design
Identity consolidation must be done in a user centric way. That means:
- Only explicit, user opted-in identity consolidation
- No "surprise" or automagic identity consolidation. Users get upset when identities/profiles they thought were different (say, because they were on different sites) are unexpectedly auto-collapsed/consolidated. Users do not expect sites to share information behind the user's back (like their email address).
- Thus do not publish email addresses (perhaps even disallow search by email address) of users.
- Don't even publish hashes of email addresses, as they can still be used by 3rd parties to perform unexpected identity consolidation.
- This problem of unexpected identity consolidation was first raised by Adaptive Path employees at events/2007-08-28-social-network-portability-today, and further documented/explained in Tantek's presentations Fundamentos Web: Social Network Portability.
Most profile systems (social network or otherwise) have a place for the user to indicate another URL for themselves, for their home page, their blog etc. Clearly this is an opt-in mechanism, as the user has to explicitly tell site A that site B is another facet of themselves. This interaction/interface passes the user centric design criteria above.
From a format perspective, all that is necessary is for the sites publishing such links to "Other Profiles" to add the XFN
rel="me" value to the respective hyperlinks in their HTML.
Q: do the rel="me" links need to be in both directions to verify the link? seems like they do, since otherwise someone could find anything that links to them and "claim" it just by linking to back to it with rel="me".
A: Yes, in general rel="me" links need to be in both directions for exactly that reason.
Q: But some sites that let you list your homepage on the profile don't use rel="me", so do we have to just get them all to use it before bi-directional claims will work right?
A: Not necessarily. Of course we prefer the Advocacy path to get them to implement rel="me", but for old sites, as documented on http://gmpg.org/xfn/and/ we can check that specific fields on the the profile page are filled in accordingly, with site specific heurstics.
Q: So either they need to use rel="me" or we can scrape known sites and trust the link anyway?
A: rel="me" is the standard that scales (so new players "just work") and for "old players" we write a white-list with compat rules to make it work.
Q: So i guess the idea is i can't insert a rel="me" link into any user-generated content, like comments on someone else's blog?
Q: What about Yelp which uses rel="nofollow" to your home page link?
A: Well the way around it is to *only* look for that specific "home page" link, not any link on the page, that's the key for old players, with the assumption being that *only* the user/owner of that profile could change that URL. In addition Yelp is actually violating the rel="nofollow" spec because that's not a third party link, that's a first party link, by the owner of that user profile, and therefore it MUST NOT have rel="nofollow" on it. This bug should be reported to them.
Q: Is two-way links plus transitive closure sufficient? Because many sites may only link to your homepage which then links out to many other sites and you'd like to be able to "reel those in"?
A: 2-way links plus transitive closure is a good start. But there are common cases where you'll have 3-step triangle circuits you need to detect. For instance, say my Plaxo profile page is joseph.myplaxo.com and I want to add my twitter page twitter.com/jsmarr. My twitter page only links to my home page josephsmarr.com, but that page links back to twitter and also to my plaxo profile. So you can prove I'm authoritative for twitter.com/jsmarr even though it doesn't have a two-way link with joseph.myplaxo.com. There may be even more complex cases, but I think the 3-way is common because many sites only let you have one URL link, which will usually be to your home page, so unless you start by telling a site your home page, you will have to crawl from a "spoke" into the "hub" and then back out again. So in general, you'll need to keep all the rel="me" links on all the pages you crawl, then assemble the graph, then detect all the circuits, and then all the nodes in circuits that are connected to the root page you're starting with are verified. I think. :)
Q: How should I crawl rel="me" links then?
A: Do each 2-way one at a time. e.g. go to a rel="me" destination, look for rel="me" link back to the same page in that source, and *then* enqueue all the remaining rel="me" links for crawling. Enqueue rel="me" relations as you crawl, e.g. you crawl a, you don't enqueue just the destinations of links b and c, but rather, enqueue the relations a-me->b, a-me->c. And then you crawl the destinations in the queue, and for confirmed rel="me" 2ways, just move those to another list, e.g. when you see b-me->a you just remove a-me->b from the queue and put a<->b into the "me" file, and when you see b-me->d you just add it to the queue. Repeat until you have crawled all the destinations in the queue and you're done.
Q: There are often multiple equivalent pages, like http://flickr.com/photos/jsmarr , http://www.flickr.com/photos/jsmarr , http://flickr.com/people/jsmarr , http://flickr.com/people/jsmarr/profile . Do we need to write equivalence rules or just make people use the same form?
A: Such pages should a) rel="me" link to both "www." and non "www." versions themselves either via the links already on the page (as they often already have), OR add equivalent <link rel="me" href="..." /> tags to the <head> of the document.
Q: So when crawling a page for rel="me" links, should I look for BOTH <a rel="me" links in the body AND <link rel="me" links in the head?
A: Yes, they're equivalent, so look for both.
Strangely the new Google Share site supports hCard but in a frame. In order to parse this page, crawlers need some hook to identify the source for the hcard. It is reasonable to consider using
<frame src="http://example.com/framesrc.html" rel="me" /> (as well as the same for
<object>) in order to accommodate this unconventional source of profile data. Likewise linking back to the page containing the frame using rel-me would be necessary to produce a valid claim.
Unfortunatly none of
<object> have a
rel attribute in HTML4, and thus we must make use of an alternate solution.
HTML4 does have a
<noframes> element which web authors often use to hyperlink to the frame contents so they are accessible in browsers that don't support frames (like most of those on mobile phones for example).
That hyperlink inside the
<noframes> element should have
rel="me" on it in order to consolidate the framed portion of the profile with the profile page itself. Similarly, on the framed portion, there should be a
<noframes> element that links back to the parent frame/document with
rel="me" in order to establish the bidirectional identity consolidation relationship.
<object> embeds, one can simply put an inline hyperlink inside the
<object>...</object> contents as fallback content for browsers that don't render embedded objects, thus providing both a visible method to browse to the embed, and a hyperlink on which to place the