[uf-discuss] Signalling use of microformats?

Tantek Ç elik tantek at cs.stanford.edu
Thu Feb 9 17:30:35 PST 2006


Welcome Patrick!

On 2/9/06 12:56 PM, "Patrick Tufts" <patrick at metaweb.com> wrote:

> Angus McIntyre wrote:
>> Obviously, a robot would be free to scan any page to see if it contained
>> content in a format it's interested in, but a 'hint' of this kind might
>> allow the robot to prioritise processing of pages that the author claims
>> contain information in a specific format.
> 
> I can't speak about crawlers and parsers in general, but I designed and
> ran crawlers for the Internet Archive and Alexa Internet.

Great to have you hear.  Looking forward to hearing more about your
experiences.


> My experience is that crawlers will parse all HTML, and that the effort
> to recognize an HTML-embedded tag that says "hey, this is a microformat
> page" probably won't make life much easier, as by that point the page is
> getting parsed anyway (in other words, there's nothing further to
> prioritize).

Interesting.


> A useful hint would be some way of presenting a list of URLs on a site
> that contain microformats data, like how robots.txt works, because it's
> easier to prioritize a list of URLs and feed them to the crawler and
> parser than it is to crawl and parse and then prioritize.

Perhaps.  But then consider would it have ever made sense to have a
photos.txt that listed all the URLs with photos or images?

As web page design and authoring continues to evolve to use more and more
semantic XHTML and microformats, it's likely that nearly all pages will have
some sort of microformatted content on them, at which point a list of such
URLs would simply be equivalent to a site map.

Thanks,

Tantek



More information about the microformats-discuss mailing list