Patrick Tufts patrick at metaweb.com
Thu Feb 9 12:56:35 PST 2006

Angus McIntyre wrote:
> Obviously, a robot would be free to scan any page to see if it contained
> content in a format it's interested in, but a 'hint' of this kind might
> allow the robot to prioritise processing of pages that the author claims
> contain information in a specific format.

I can't speak about crawlers and parsers in general, but I designed and
ran crawlers for the Internet Archive and Alexa Internet.

My experience is that crawlers will parse all HTML, and that the effort
to recognize an HTML-embedded tag that says "hey, this is a microformat
page" probably won't make life much easier, as by that point the page is
getting parsed anyway (in other words, there's nothing further to

A useful hint would be some way of presenting a list of URLs on a site
that contain microformats data, like how robots.txt works, because it's
easier to prioritize a list of URLs and feed them to the crawler and
parser than it is to crawl and parse and then prioritize.


