[microformats-discuss] Comments on robot exclusion
Robert Bachmann
rbach at rbach.priv.at
Fri Jul 22 09:44:06 PDT 2005
Peter Janes wrote:
> I think that might be overkill, considering the example page at
> http://peterjanes.ca/2005/robots/example does something very similar
> already using only CSS styles and a little bit of JavaScript to toggle
> them.
I have tried to display "noarchive" with a grey background color.
Displaying it with font-style: italic, isn't very usable because
<em> and <i> will be rendered also in italics.
See <http://rbach.priv.at/Misc/2005/REP/GreyForNoArchive>.
I have some comments on the issues of REP.
| Should earlier values take precedence or later? Does
| class="robots-nofollow robots-follow" means the same as
| class="robots-nofollow" or class="robots-follow"?
I think that the later value should have precedence, because it seems
more user-friendly to me.
This is because of the way I think about HTML. Although it isn't
a programming language I read, write and understand it as a sort-of
programming language.
If I wanted to translate >>class="robots-nofollow"<< to C++ I would write:
robot_follow = 0;
If I wanted to translate >>class="robots-follow"<< to C++ I would write:
robot_follow = 1;
If I wanted to translate >>class="robots-nofollow robots-follow"<<
in C++ I would write:
robot_follow = 0;
robot_follow = 1;
This two lines of C++ code would result into robot_follow having
the value 1.
So if C++ does it that way, REP should do it in the same way ;-) or
should forbid contradictory values.
| Interaction with relnofollow: what does class="robots-follow"
| rel="nofollow" mean? Currently relnofollow has no profile URI defined,
| so the Robot Exclusion Profile takes precedence.
I don't think this is an issue.
Consider the following examples:
<a class="robots-nofollow" href="http://example.com/">foo</a></p>
Obviously means: "Don't follow this link."
<a class="robots-nofollow" rel="nofollow"
href="http://example.com/">foo</a>
Obviously means the same: "Don't follow this link."
<a class="robots-follow" rel="nofollow"
href="http://example.com/">foo</a>
I think this means:
- You may follow this link [robots-follow]
but
- if follow it don't add any 'weight' to it [nofollow]
| Does not allow control of specific UAs à la A Standard for Robot
| Exclusion
I presented a possibility for bot-specific rules [1] but I
think it would be a heavy burden for both authors and bot implementors,
because it makes things quite complicated.
IMO having "robots.txt" for "crude" bot exclusion
(e.g: ban archive.org's bot or Google's image search) is enough.
Once bots enter a page there shouldn't be any urgent need for
distinguishing them.
| The "efficacy" and "collateral damage" issues from rel="nofollow" also
| apply.
| Collateral Damage. If tools automatically add nofollow [or
| robots-nofollow, etc] to all 3rd party links, then many legitimate
| non-spam links will be ignored or given reduced weight, and thus the
| destination of such links will be unfortunate casualties.
This is out of scope for REP and RelNoFollow.
But I suggest that we provide some tips for content generator implementers.
I'll present my way to reduce the "collateral" damage.
If there is interest in it I could set up a wiki page for it
and improve it.
I'll talk about forums but please note that (at least for this example)
that a blog and a guest book can be interpreted as a special kind of forum.
Curly braces around a value mean that this value is only provided as an
example an must be adjustable by the administrator.
I separate forum users in four groups:
- untrusted:
Unregistered users and every registered which
isn't a member of the other groups.
- half-trusted:
When a registered untrusted user has more than {20} posts
the moderator group is notified and may add him to the
half-trusted group.
- full-trusted
When a half-trusted user has more than {200} posts
the moderator group is notified and may add him to the
half-trusted group.
- moderators (Every moderator and administrator)
When an untrusted or half-trusted user makes a post into a forum this
post is stored in a database with the CUC flag (contains untrusted
content) set.
When a post with the CUC flag is rendered as
HTML there will be "robots-nofollow", "nofollow", "noindex", etc. added.
When an moderator or full-trusted user makes a post the CUT flag
won't be set.
When a moderator reads a post with the CUC flag set he will be presented
a link (e.g "Mark as OK").
If a moderator marks the post as OK the CUC flag will be cleared.
The next time the the post is rendered it will be rendered
without "robots-nofollow", "nofollow", "noindex", ...
Optional: (may not be an good idea anyway)
- If more than {8} (different) half-trusted users reply to a
post which has the CUC flag set, the flag may be
automatically removed.
- If more than {2} (different) full-trusted users reply to a
post which has the CUC flag set, the flag may
be automatically removed.
Rationale: Some forums may be huge, and not every post is read by a
moderator, but if trusted users reply to post it means that the post
contains content which isn't spam.
Of course a "Report as spam" link should be added to every post, so that
spam can be reported to the moderators.
Robert
[1] http://microformats.org/wiki/robots-exclusion#Specificity
--
Robert Bachmann <rbach at rbach.priv.at> (OpenPGP KeyID: 0x4A5CCF10)
More information about the microformats-discuss
mailing list