[uf-new] Microformats support for pagination

André Luís andreluis.pt at gmail.com
Wed Jan 14 07:53:57 PST 2009


Thanks for the insight, Brian.

Comments inline.

On Wed, Jan 14, 2009 at 2:10 PM, Brian Suda <brian.suda at gmail.com> wrote:
> On 1/14/09, André Luís <andr3.pt at gmail.com> wrote:
>> Do you have any suggestions on how to deal with repetitions? I've
>>  tried parsing several pages of several websites and some of them used
>>  rel-tags on tagclouds... these would be present on every page (sidebar
>>  of blog) thus rendering the data kinda useless.
>
> --- do you have a real world example of where this would be a problem?
> The old technorati kitchen crawled the web and allowed you to search
> it. Having repetitions actually allowed for a nice merging of the
> data.

Right, in certain contexts it makes sense to merge data and end up
with a more meaningful set of instances (of events, vcards, etc), but
in others, not quite. I'll give an example.


I coded a script that looks at a given page and grabs the rel-tags in
that page. It then counts the occurrences and orders them in
descending order.

the script is at http://workshop.andr3.net/tageater/

this was meant to infer the user's attention profile from the rel-tags...

the problem starts if I follow the rel-* links. For example the
website macacos.com marks-up the tagcloud with rel-tags on every page,
so if I follow the rel-archives I'll end up getting the tagcloud on
every one of them...

Have a look at http://workshop.andr3.net/tageater/?url=http%3A%2F%2Fmacacos.com

I'm not following the links here because I was stuck with this doubt
so I just print a link to them.

Using rel-tags in tagclouds might be discouraged, but the fact is that
it happens quite a bit in the wild. I saved a static html page of the
scraping I did back then at all the barcamp atendees' webpages. you
can have a look here:
http://workshop.andr3.net/tageater/examples/barcamp.html , but for
instance these are a few that use rel-tags on tagclouds:
- http://macacos.com/
- http://www.devile.net/
- http://blog.pfragoso.org/
- http://www.brunoamaral.com/
- ...

So, how to detect repetition in these cases?

>
>>  Should/can we create guidelines for producers AND parsers alike on how
>>  to deal with this? Like adding site-wide unique id's to the root
>>  elements? Or is this out of the scope of microformats altogether?
>
> --- again, this would depend on the format in question. The existance
> of multiple events with the same timestamp and name could be used to
> merge data, UIDs and URLs could be as well, but everything could be
> gamed.

So what you're saying is that this falls out of the spec's scope,
right? It should be the parsers adapting their behaviour depending on
their goal?

>
> But this isn´t unique to microformats, other semantic technologies
> would have this issue as well. There was talk of a rel-canonical
> awhile ago, but it wasn't big enough a problem to pursue.

You're right. Do you have a link where I can read more about that
discussion? Thanks.

>
> If you have an example we could work through it.
>
> -brian
>


cheers,
--
André Luís



More information about the microformats-new mailing list