[microformats-discuss] Comments on robot exclusion

Robert Bachmann rbach at rbach.priv.at
Fri Jul 22 09:44:06 PDT 2005


Peter Janes wrote:
> I think that might be overkill, considering the example page at
> http://peterjanes.ca/2005/robots/example does something very similar
> already using only CSS styles and a little bit of JavaScript to toggle
> them.

I have tried to display "noarchive" with a grey background color.
Displaying it with font-style: italic, isn't very usable because
<em> and <i> will be rendered also in italics.
See <http://rbach.priv.at/Misc/2005/REP/GreyForNoArchive>.

I have some comments on the issues of REP.

| Should earlier values take precedence or later? Does
| class="robots-nofollow robots-follow" means the same as
| class="robots-nofollow" or class="robots-follow"?

I think that the later value should have precedence, because it seems
more user-friendly to me.
This is because of the way I think about HTML. Although it isn't
a programming language I read, write and understand it as a sort-of
programming language.

If I wanted to translate >>class="robots-nofollow"<< to C++ I would write:
robot_follow = 0;

If I wanted to translate >>class="robots-follow"<< to C++ I would write:
robot_follow = 1;

If I wanted to translate >>class="robots-nofollow robots-follow"<<
in C++ I would write:
robot_follow = 0;
robot_follow = 1;

This two lines of C++ code would result into robot_follow having
the value 1.
So if C++ does it that way, REP should do it in the same way ;-) or
should forbid contradictory values.

| Interaction with relnofollow: what does class="robots-follow"
| rel="nofollow" mean? Currently relnofollow has no profile URI defined,
| so the Robot Exclusion Profile takes precedence.

I don't think this is an issue.

Consider the following examples:

<a class="robots-nofollow" href="http://example.com/">foo</a></p>
Obviously means: "Don't follow this link."

<a class="robots-nofollow" rel="nofollow"
   href="http://example.com/">foo</a>
Obviously means the same: "Don't follow this link."

<a class="robots-follow" rel="nofollow"
   href="http://example.com/">foo</a>
I think this means:
- You may follow this link [robots-follow]
but
- if follow it don't add any 'weight' to it [nofollow]

| Does not allow control of specific UAs à la A Standard for Robot
| Exclusion

I presented a possibility for bot-specific rules [1] but I
think it would be a heavy burden for both authors and bot implementors,
because it makes things quite complicated.
IMO having "robots.txt" for "crude" bot exclusion
(e.g: ban archive.org's bot or Google's image search) is enough.
Once bots enter a page there shouldn't be any urgent need for
distinguishing them.

| The "efficacy" and "collateral damage" issues from rel="nofollow" also
| apply.

| Collateral Damage. If tools automatically add nofollow [or
| robots-nofollow, etc] to all 3rd party links, then many legitimate
| non-spam links will be ignored or given reduced weight, and thus the
| destination of such links will be unfortunate casualties.

This is out of scope for REP and RelNoFollow.
But I suggest that we provide some tips for content generator implementers.

I'll present my way to reduce the "collateral" damage.
If there is interest in it I could set up a wiki page for it
and improve it.
I'll talk about forums but please note that (at least for this example)
that a blog and a guest book can be interpreted as a special kind of forum.
Curly braces around a value mean that this value is only provided as an
example an must be adjustable by the administrator.

I separate forum users in four groups:

- untrusted:
  Unregistered users and every registered which
  isn't a member of the other groups.

- half-trusted:
  When a registered untrusted user has more than {20} posts
  the moderator group is notified and may add him to the
  half-trusted group.

- full-trusted
  When a half-trusted user has more than {200} posts
  the moderator group is notified and may add him to the
  half-trusted group.

- moderators (Every moderator and administrator)


When an untrusted or half-trusted user makes a post into a forum this
post is stored in a database with the CUC flag (contains untrusted
content) set.

When a post with the CUC flag is rendered as
HTML there will be "robots-nofollow", "nofollow", "noindex", etc. added.

When an moderator or full-trusted user makes a post the CUT flag
won't be set.

When a moderator reads a post with the CUC flag set he will be presented
a link (e.g "Mark as OK").
If a moderator marks the post as OK the CUC flag will be cleared.
The next time the the post is rendered it will be rendered
without "robots-nofollow", "nofollow", "noindex", ...

Optional: (may not be an good idea anyway)

- If more than {8} (different) half-trusted users reply to a
post which has the CUC flag set, the flag may be
automatically removed.

- If more than {2} (different) full-trusted users reply to a
post which has the CUC flag set, the flag may
be automatically removed.

Rationale: Some forums may be huge, and not every post is read by a
moderator, but if trusted users reply to post it means that the post
contains content which isn't spam.
Of course a "Report as spam" link should be added to every post, so that
spam can be reported to the moderators.


Robert

[1] http://microformats.org/wiki/robots-exclusion#Specificity
-- 
Robert Bachmann <rbach at rbach.priv.at> (OpenPGP KeyID: 0x4A5CCF10)


More information about the microformats-discuss mailing list