[uf-discuss] Enumerating Microformats on a Page
scott at randomchaos.com
Fri Mar 24 19:07:01 PST 2006
On Mar 24, 2006, at 7:25 PM, Phil Haack wrote:
> But many sites do present a sitemap already for humans first. I
> think it’s quite helpful when a site does have one. Not everyone
> will generate them, true, but a sitemap can also represent a
> logical structure that isn’t necessary reflective of a filesystem
I don't expect they'll do much good, but I don't see how they could
hurt anything, so if you think it will help, I'd say go ahead and
work on it.
> The sitemap itself can be content for the end users. If one
> existed, wouldn’t we want to take advantage of it?
I'd want to take advantage of it to decide where to start, but not
where to end. A search engine should seek to maximize the search
area to improve results. I want to look at everything on your site,
unless instructed otherwise.
> If you are looking for Microformats on my site and pointed an
> aggregator at my home page, I’d rather you use my sitemap than
> crawl my entire site.
That's what robots.txt is for. My own spider doesn't currently
respect robots.txt, but it probably should because that's the
industry standard to tell a spider you don't want something crawled.
Site maps are more to tell a spider you *do* want something crawled.
> I understand the DRY principle as well, but in this case, the
> sitemap is a unique piece of content that isn’t repeated anywhere
> else. If you think about it, even having xpmd’s in the head
> section is a form of repetition. If I remove a microformat or add
> one to a page, I should remember to update the xpmds in the head
I don't think profiles are really repetition. Profiles answer the
question "what does 'X' mean?" rather than "is there any 'X' data
here?" Granted, we don't really need to know what 'X' means unless
there is 'X' data here, but the difference between useless
information and false information is significant. If a profile
doesn't reflect the content, you have useless information about what
something means. If a site map doesn't reflect the content, you have
false information about what something contains.
More information about the microformats-discuss