[uf-new] Microformats support for pagination
André Luís
andreluis.pt at gmail.com
Wed Jan 14 07:53:57 PST 2009
Thanks for the insight, Brian.
Comments inline.
On Wed, Jan 14, 2009 at 2:10 PM, Brian Suda <brian.suda at gmail.com> wrote:
> On 1/14/09, André Luís <andr3.pt at gmail.com> wrote:
>> Do you have any suggestions on how to deal with repetitions? I've
>> tried parsing several pages of several websites and some of them used
>> rel-tags on tagclouds... these would be present on every page (sidebar
>> of blog) thus rendering the data kinda useless.
>
> --- do you have a real world example of where this would be a problem?
> The old technorati kitchen crawled the web and allowed you to search
> it. Having repetitions actually allowed for a nice merging of the
> data.
Right, in certain contexts it makes sense to merge data and end up
with a more meaningful set of instances (of events, vcards, etc), but
in others, not quite. I'll give an example.
I coded a script that looks at a given page and grabs the rel-tags in
that page. It then counts the occurrences and orders them in
descending order.
the script is at http://workshop.andr3.net/tageater/
this was meant to infer the user's attention profile from the rel-tags...
the problem starts if I follow the rel-* links. For example the
website macacos.com marks-up the tagcloud with rel-tags on every page,
so if I follow the rel-archives I'll end up getting the tagcloud on
every one of them...
Have a look at http://workshop.andr3.net/tageater/?url=http%3A%2F%2Fmacacos.com
I'm not following the links here because I was stuck with this doubt
so I just print a link to them.
Using rel-tags in tagclouds might be discouraged, but the fact is that
it happens quite a bit in the wild. I saved a static html page of the
scraping I did back then at all the barcamp atendees' webpages. you
can have a look here:
http://workshop.andr3.net/tageater/examples/barcamp.html , but for
instance these are a few that use rel-tags on tagclouds:
- http://macacos.com/
- http://www.devile.net/
- http://blog.pfragoso.org/
- http://www.brunoamaral.com/
- ...
So, how to detect repetition in these cases?
>
>> Should/can we create guidelines for producers AND parsers alike on how
>> to deal with this? Like adding site-wide unique id's to the root
>> elements? Or is this out of the scope of microformats altogether?
>
> --- again, this would depend on the format in question. The existance
> of multiple events with the same timestamp and name could be used to
> merge data, UIDs and URLs could be as well, but everything could be
> gamed.
So what you're saying is that this falls out of the spec's scope,
right? It should be the parsers adapting their behaviour depending on
their goal?
>
> But this isn´t unique to microformats, other semantic technologies
> would have this issue as well. There was talk of a rel-canonical
> awhile ago, but it wasn't big enough a problem to pursue.
You're right. Do you have a link where I can read more about that
discussion? Thanks.
>
> If you have an example we could work through it.
>
> -brian
>
cheers,
--
André Luís
More information about the microformats-new
mailing list