[uf-discuss] Enumerating Microformats on a Page
haacked at gmail.com
Fri Mar 24 12:47:59 PST 2006
I totally agree. AFAIK Microformats are in part about building upon
emergent behavior and looking at precedent, so it would have to be optional.
The precedent here, as you stated, is Google Sitemaps. I also see RSS
auto-discovery as a precedent. With most implementations of RSS
auto-discovery, you look for the <link ...> tag first. If it's not there,
you start crawling.
The key here is that unlike Google Sitemaps, this sitemap would be human
viewable. Yes it could get out of synch, but since it's for the benefit of
a user as well as machine, there's incentive to keep it in synch.
Also, consider that most blogging engines (and CMS systems) have an easy
means to generate a sitemap. I know the .NET platform has a SiteMap control
that could probably be transformed.
From: microformats-discuss-bounces at microformats.org
[mailto:microformats-discuss-bounces at microformats.org] On Behalf Of Breton
Sent: Friday, March 24, 2006 12:41 PM
To: Microformats Discuss
Subject: Re: [uf-discuss] Enumerating Microformats on a Page
I find it to be an interesting idea, though I strongly suggest that
such a sitemap should be optional, and user agents should crawl the
entire site when no such sitemap exists.
Historically, sitemaps serve several very specific purposes:
Provide links to orphan pages
Exclude sites which the author does not want indexed (as in robots.txt)
Provide an index page for users
It would seem to me that a sitemap would not allow significantly more
rapid discovery of microformats than simply crawling the site
normally, and looking for supported root classnames. you face the
problem of a sitemap becoming out of synch with your content, and
thus missing out on newer or forgotten content due to an out of date
TOC. To solve that problem you end up maintaining two versions of the
data, and you've eliminated one of the key benefits of microformats,
namely only having to maintain one source of data.
On the other hand, looking at it from a user centered, and search
engine point of view, having a sitemap is good practice anyway, and
if you're going to maintain one for the benefit of a search engine,
why not have a standardized "best practice" for marking one up? Such
an index could not only contain the links to all the pages on your
site, but also rel="nofollow" links for sites that you don't want
indexed, links to all the feeds on your site, and some kind of meta
data format which perhaps indicates whether a link contains
microformats. I suggest such data should not be relied upon, but
should instead inform a weighting mechanism such that pages indicated
as containing microformats are crawled first in the queue, allowing a
more responsive experience in any UA which implements this. Another
possible choice is to use such links to present a menu of options to
the user, to allow more discriminating selection of microformat content.
To this end, a good place to start would be to look to existing
sitemaps, including google's sitemap xml markup, and the markup
contained in various websites accross the net which contain sitemaps.
On Mar 24, 2006, at 1:16 PM, Phil Haack wrote:
> People do read Microformat content directly which I understand. It
> with the "Human First" principle.
> But references to the xmdp profiles are in the <head> element which
> is NOT
> human readable. So there is precedent for non-human readable
> discoverability mechanism within Microformats.
> At Mix06, Tantek pointed out that listing all the xmdp profiles
> that a site
> used on a homepage could get unwieldy.
> I suppose if I wanted to help both people and an aggregator find
> Microformats of interest, there could be a microformat for a site
> index. My
> homepage could include it or simply link to it using some other
> Thus for the human, there would be a simple link to follow <a
> href="/siteindex/" rel="siteindex">Site Map</a>. Likewise, my
> would look for this if it didn't find the xmdp profile for a
> sitemap on the
> current page.
> I think this might be useful so aggregators (and users) don't have
> to crawl
> an entire site.
> Has there been any work done in this area? Is it a bad idea?
> -----Original Message-----
> From: microformats-discuss-bounces at microformats.org
> [mailto:microformats-discuss-bounces at microformats.org] On Behalf Of
> Sent: Friday, March 24, 2006 11:50 AM
> To: Microformats Discuss
> Subject: Re: [uf-discuss] Enumerating Microformats on a Page
> Because feed auto-discovery links are in the content, not the headers
> of HTTP responses, aggregators have to download the entire page, and
> most aggregators search first for <link type="alternate" ...> tags,
> and second for something like <a href="something.rss">RSS</a>. The
> link tag makes more sense here because people don't read feeds
> directly, so it doesn't make a lot of sense to provide human-readable
> <a> links to feeds. But people *do* read microformat content
> directly, so if it's related to the current page, it should be linked
> from the current page, and any human or machine looking site-wide for
> microformat content (or anything else) should follow links throughout
> the site.
> microformats-discuss mailing list
> microformats-discuss at microformats.org
microformats-discuss mailing list
microformats-discuss at microformats.org
More information about the microformats-discuss