robots-exclusion: Difference between revisions
mNo edit summary |
(Answer "is this a microformat"?) |
||
Line 16: | Line 16: | ||
== Introduction == | == Introduction == | ||
The [http://www.robotstxt.org/wc/meta-user.html Robots META tag] is used to provide page-specific direction for web crawlers. While being useful in many cases, its page-specific nature means it cannot be used to restrict crawlers from indexing only certain sections of a document. Several attempts have been made to create more granular solutions through various methods but have perceived shortcomings that limit their use; the | The [http://www.robotstxt.org/wc/meta-user.html Robots META tag] is used to provide page-specific direction for web crawlers. While being useful in many cases, its page-specific nature means it cannot be used to restrict crawlers from indexing only certain sections of a document. Several attempts have been made to create more granular solutions through various methods but have perceived shortcomings that limit their use; the Robot Exclusion Profile defines a microformat that can be applied to any element or set of elements in a page. | ||
Like other microformats such as [[hcalendar|hCalendar]], the Robot Exclusion Profile defines a set of class names that may be applied to (X)HTML elements. <code>class</code> can be applied to almost every (X)HTML element, which means that authors may be as specific or general as they wish in their application. This differs from the similarly-purposed <code>rel="nofollow"</code> attribute, which may only be applied to (and does not refer to the content of) a specific inline link. (It is interesting to note that this behaviour is entirely encompassed by the use of <code>class="robots-nofollow"</code> on the same element.) Classes are also additive, so multiple values can be specified at once, e.g. <code>class="robots-nofollow robots-noindex"</code>. For robot exclusion in particular, this allows authors to specify multiple rules for an element without adding unnecessary extra markup. | Like other microformats such as [[hcalendar|hCalendar]], the Robot Exclusion Profile defines a set of class names that may be applied to (X)HTML elements. <code>class</code> can be applied to almost every (X)HTML element, which means that authors may be as specific or general as they wish in their application. This differs from the similarly-purposed <code>rel="nofollow"</code> attribute, which may only be applied to (and does not refer to the content of) a specific inline link. (It is interesting to note that this behaviour is entirely encompassed by the use of <code>class="robots-nofollow"</code> on the same element.) Classes are also additive, so multiple values can be specified at once, e.g. <code>class="robots-nofollow robots-noindex"</code>. For robot exclusion in particular, this allows authors to specify multiple rules for an element without adding unnecessary extra markup. | ||
Line 109: | Line 109: | ||
* Should earlier values take precedence or later? Does <code>class="robots-nofollow robots-follow"</code> means the same as <code>class="robots-nofollow"</code> or <code>class="robots-follow"</code>? | * Should earlier values take precedence or later? Does <code>class="robots-nofollow robots-follow"</code> means the same as <code>class="robots-nofollow"</code> or <code>class="robots-follow"</code>? | ||
* <code>meta</code> tag suggests not using conflicting or repeating directives and so does not specify precedence | * <code>meta</code> tag suggests not using conflicting or repeating directives and so does not specify precedence | ||
* Interaction with [[relnofollow]]: what does <code>class="robots-follow" rel="nofollow"</code> mean? Currently [relnofollow] has no profile URI defined, so the Robot Exclusion Profile takes precedence. In the future, per XMDP's [http://gmpg.org/xmdp/description#multiple Using Multiple Profiles], <q>the URIs in the 'profile' attribute are to be treated most significant (first) to least significant (last).</q> | * Interaction with [[relnofollow]]: what does <code>class="robots-follow" rel="nofollow"</code> mean? Currently [[relnofollow]] has no profile URI defined, so the Robot Exclusion Profile takes precedence. In the future, per XMDP's [http://gmpg.org/xmdp/description#multiple Using Multiple Profiles], <q>the URIs in the 'profile' attribute are to be treated most significant (first) to least significant (last).</q> | ||
=== Specificity === | === Specificity === | ||
Line 115: | Line 115: | ||
=== Keywords === | === Keywords === | ||
* The keywords <code>all</code> and <code>none</code> are defined by the Robots META Tag as convenience shortcuts to enable or disable the combination of <code>nofollow</code> and <code>noindex</code>, but predate Google's <code>noarchive</code> and should not be considered to include it. As a result, for purposes of clarity and simplicity (the [http://gmpg.org/xmdp/description#principles XMDP Minimalism principle]), they are not included in this version of the | * The keywords <code>all</code> and <code>none</code> are defined by the Robots META Tag as convenience shortcuts to enable or disable the combination of <code>nofollow</code> and <code>noindex</code>, but predate Google's <code>noarchive</code> and should not be considered to include it. As a result, for purposes of clarity and simplicity (the [http://gmpg.org/xmdp/description#principles XMDP Minimalism principle]), they are not included in this version of the Robot Exclusion Profile. | ||
=== Suitability as a microformat === | |||
* Isn't the Robot Exclusion Profile designed for machines first and humans second instead of vice versa? Yes, just as much as [[relnofollow]], the deployed microformat that it's designed to replace. |
Revision as of 04:36, 26 June 2005
Robot Exclusion Profile
Draft Specification 2005-06-18
Authors
Copyright
This specification is © 2004-2005 by the author. However, the author intends to submit this specification to a standards body with a liberal copyright/licensing policy such as the GMPG. See the GMPG Principles for more details. Anyone wishing to contribute to this effort MUST read those principles, especially those regarding copyright and licensing, and agree to them before contributing.
Patents
The author neither holds nor intends to apply for any patents on anything required to implement this specification.
Abstract
The Robot Exclusion Profile is a reworking of the Robots META tag (and less-standard extensions) as a microformat.
Introduction
The Robots META tag is used to provide page-specific direction for web crawlers. While being useful in many cases, its page-specific nature means it cannot be used to restrict crawlers from indexing only certain sections of a document. Several attempts have been made to create more granular solutions through various methods but have perceived shortcomings that limit their use; the Robot Exclusion Profile defines a microformat that can be applied to any element or set of elements in a page.
Like other microformats such as hCalendar, the Robot Exclusion Profile defines a set of class names that may be applied to (X)HTML elements. class
can be applied to almost every (X)HTML element, which means that authors may be as specific or general as they wish in their application. This differs from the similarly-purposed rel="nofollow"
attribute, which may only be applied to (and does not refer to the content of) a specific inline link. (It is interesting to note that this behaviour is entirely encompassed by the use of class="robots-nofollow"
on the same element.) Classes are also additive, so multiple values can be specified at once, e.g. class="robots-nofollow robots-noindex"
. For robot exclusion in particular, this allows authors to specify multiple rules for an element without adding unnecessary extra markup.
Format
Profile URI
http://example.org/xmdp/robots-profile#
(obviously preliminary)
The classes defined by the Robot Exclusion Profile should be considered meaningless when the profile URI is not present in the document <head>
's profile
attribute.
XMDP Profile
<dl class="profile"> <dt id="robots-nofollow">robots-nofollow</dt> <dd> Informs robots that links contained by the element are not to be followed. </dd> <dt id="robots-follow">robots-follow</dt> <dd> Informs robots that links contained by the element are to be followed. </dd> <dt id="robots-noindex">robots-noindex</dt> <dd> Informs robots that the content of the element is not to be included as part of the page. </dd> <dt id="robots-index">robots-index</dt> <dd> Informs robots that the content of the element is to be included as part of the page. </dd> <dt id="robots-noarchive">robots-noarchive</dt> <dd> Informs caching robots that the content of the element is not to be included in their cached copy. </dd> <dt id="robots-archive">robots-archive</dt> <dd> Informs caching robots that the content of the element is to be included in their cached copy. </dd> </dl>
Examples
Removing page content:
<head profile=”http://example.org/xmdp/robots-profile#”> ... <div class=”robots-noindex”>There once was a man from Nantucket…</div> <p>This page is not about <span class=”robots-noindex”>pornography</span>.</p>
Showing nofollow
in conjunction with votelinks, and applying it in parallel with relnofollow:
<head profile=”http://example.org/xmdp/robots-profile#”> ... <p class=”robots-nofollow”>This is <a href=”http://example.com/bogus”>a bogus link</a> and so is <a href=”http://example.net/bogus”>this</a>.</p> <p>I don't like <a rel="nofollow" rev="vote-against" class="robots-nofollow" href="http://example.com/disagree">this page</a> but I do like <a rev="vote-for" href="http://example.com/agree">this one</a>.</p>
Preventing images from being stored by search engines, forcing them to be retrieved from the originating website:
<head profile="http://example.org/xmdp/robots-profile#"> ... <p><img src="example.png" class="robots-noarchive" alt="Private image" /></p>
A more complex example is available which also shows how the robots metadata may be visualized.
References
Normative
Informative
- A Standard for Robot Exclusion
- Googlebot Frequently Asked Questions
- The ROBOTS META Tag
- RelNoFollow Draft Specification
- This page was contributed from the technorati developers' wiki.
Thanks
Issues
These are open issues that have been raised in various forums. The "efficacy" and "collateral damage" issues from rel="nofollow" also apply.
Precedence
- Should earlier values take precedence or later? Does
class="robots-nofollow robots-follow"
means the same asclass="robots-nofollow"
orclass="robots-follow"
? meta
tag suggests not using conflicting or repeating directives and so does not specify precedence- Interaction with relnofollow: what does
class="robots-follow" rel="nofollow"
mean? Currently relnofollow has no profile URI defined, so the Robot Exclusion Profile takes precedence. In the future, per XMDP's Using Multiple Profiles,the URIs in the 'profile' attribute are to be treated most significant (first) to least significant (last).
Specificity
- Does not allow control of specific UAs à la A Standard for Robot Exclusion
Keywords
- The keywords
all
andnone
are defined by the Robots META Tag as convenience shortcuts to enable or disable the combination ofnofollow
andnoindex
, but predate Google'snoarchive
and should not be considered to include it. As a result, for purposes of clarity and simplicity (the XMDP Minimalism principle), they are not included in this version of the Robot Exclusion Profile.
Suitability as a microformat
- Isn't the Robot Exclusion Profile designed for machines first and humans second instead of vice versa? Yes, just as much as relnofollow, the deployed microformat that it's designed to replace.