[uf-new] Microformats support for pagination
André Luís
andr3.pt at gmail.com
Wed Jan 14 15:07:30 PST 2009
On Wed, Jan 14, 2009 at 9:49 PM, Brian Suda <brian.suda at gmail.com> wrote:
> On 1/14/09, André Luís <andreluis.pt at gmail.com> wrote:
> > I coded a script that looks at a given page and grabs the rel-tags in
> > that page. It then counts the occurrences and orders them in
> > descending order.
> >
> > the script is at http://workshop.andr3.net/tageater/
> >
> > this was meant to infer the user's attention profile from the rel-tags...
> >
> > the problem starts if I follow the rel-* links. For example the
> > website macacos.com marks-up the tagcloud with rel-tags on every page,
>
>
>
>> So, how to detect repetition in these cases?
>
>
> --- wouldn't you just keep a list of the pages you have already
> crawled? So if you find a tagcloud on page /item1.html and it links to
> /tags/tag1 then on page item2.htm you re-find the tag cloud which
> links to /tags/tag1 you don't follow it again?
>
Like Toby said in a later reply (which I'll reply after this, to avoid
confusion), I don't follow the tags, but I would follow the
rel-[next|prev|archives|...] links.. so the same set of tags keep
popping up... even if the url changes (and yes, you should keep a
bucket of crawled-links to avoid infinite loops) if you keep getting
the same set of tags, it will only increase in number of occurrences
thus, the weight loses meaning.
However, from my little testing and later interview with the sites
owners, I think the weight of each tag is relative... since pretty
much all of the tags are meaningful to the owner of the website... you
just can't say that X > Y... but you can say that the owner of that
site is at least interested in X and Y. Unless you see some holes in
my logic. ;)
>
> > So what you're saying is that this falls out of the spec's scope,
> > right? It should be the parsers adapting their behaviour depending on
> > their goal?
>
>
> --- probably out of side of the spec, but certainly a best-practices
> should cover these sorts of issues.
>
Agreed.
>
> > You're right. Do you have a link where I can read more about that
> > discussion? Thanks.
>
>
> There was discussion about canonical hCards 2 years ago
> http://microformats.org/discuss/mail/microformats-discuss/2007-January/008265.html
>
> I am not sure how helpful any of that discussion was/is to this problem.
>
Alright, I'll have a look. And on the wiki as well. I think tags are a
whole different matter though, because they're based on a single
element (just like xfn, and other rel-based ufs)
--
André Luís
More information about the microformats-new
mailing list