<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>http://microformats.org/wiki/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Habakuk</id>
	<title>Microformats Wiki - User contributions [en]</title>
	<link rel="self" type="application/atom+xml" href="http://microformats.org/wiki/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Habakuk"/>
	<link rel="alternate" type="text/html" href="http://microformats.org/wiki/Special:Contributions/Habakuk"/>
	<updated>2026-05-05T11:04:57Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.38.4</generator>
	<entry>
		<id>http://microformats.org/wiki/index.php?title=robots-exclusion&amp;diff=17890</id>
		<title>robots-exclusion</title>
		<link rel="alternate" type="text/html" href="http://microformats.org/wiki/index.php?title=robots-exclusion&amp;diff=17890"/>
		<updated>2007-01-14T11:42:46Z</updated>

		<summary type="html">&lt;p&gt;Habakuk: /* Suitability as a microformat */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Robot Exclusion Profile =&lt;br /&gt;
__TOC__&lt;br /&gt;
== Draft Specification 2005-06-18 ==&lt;br /&gt;
&lt;br /&gt;
=== Authors ===&lt;br /&gt;
* [http://peterjanes.ca/ Peter Janes]&lt;br /&gt;
&lt;br /&gt;
=== Copyright ===&lt;br /&gt;
This specification is © 2004-2005 by the author.  However, the author intends to submit this specification to a standards body with a liberal copyright/licensing policy such as the [http://gmpg.org/ GMPG]. See the [http://gmpg.org/principles GMPG Principles] for more details. Anyone wishing to contribute to this effort MUST read those principles, especially those regarding copyright and licensing, and agree to them before contributing.&lt;br /&gt;
&lt;br /&gt;
=== Patents ===&lt;br /&gt;
The author neither holds nor intends to apply for any patents on anything required to implement this specification.&lt;br /&gt;
&lt;br /&gt;
== Abstract ==&lt;br /&gt;
The Robot Exclusion Profile is a reworking of the Robots META tag (and less-standard extensions) as a [[microformat]].&lt;br /&gt;
&lt;br /&gt;
== Introduction ==&lt;br /&gt;
The [http://www.robotstxt.org/wc/meta-user.html Robots META tag] is used to provide page-specific direction for web crawlers.  While being useful in many cases, its page-specific nature means it cannot be used to restrict crawlers from indexing only certain sections of a document.  Several attempts have been made to create more granular solutions through various methods but have perceived shortcomings that limit their use; the Robot Exclusion Profile defines a microformat that can be applied to any element or set of elements in a page.&lt;br /&gt;
&lt;br /&gt;
Like other microformats such as [[hcalendar|hCalendar]], the Robot Exclusion Profile defines a set of class names that may be applied to (X)HTML elements.  &amp;lt;code&amp;gt;class&amp;lt;/code&amp;gt; can be applied to almost every (X)HTML element, which means that authors may be as specific or general as they wish in their application.  This differs from the similarly-purposed &amp;lt;code&amp;gt;rel=&amp;quot;nofollow&amp;quot;&amp;lt;/code&amp;gt; attribute, which may only be applied to (and does not refer to the content of) a specific inline link.  (It is interesting to note that this behaviour is entirely encompassed by the use of &amp;lt;code&amp;gt;class=&amp;quot;robots-nofollow&amp;quot;&amp;lt;/code&amp;gt; on the same element.)  Classes are also additive, so multiple values can be specified at once, e.g. &amp;lt;code&amp;gt;class=&amp;quot;robots-nofollow robots-noindex&amp;quot;&amp;lt;/code&amp;gt;.  For robot exclusion in particular, this allows authors to specify multiple rules for an element without adding unnecessary extra markup.&lt;br /&gt;
&lt;br /&gt;
== Format ==&lt;br /&gt;
=== Profile URI ===&lt;br /&gt;
&amp;lt;code&amp;gt;&amp;lt;nowiki&amp;gt;http://example.org/xmdp/robots-profile#&amp;lt;/nowiki&amp;gt;&amp;lt;/code&amp;gt; (obviously preliminary)&lt;br /&gt;
&lt;br /&gt;
The classes defined by the Robot Exclusion Profile should be considered meaningless when the profile URI is not present in the document &amp;lt;code&amp;gt;&amp;amp;lt;head&amp;amp;gt;&amp;lt;/code&amp;gt;'s &amp;lt;code&amp;gt;profile&amp;lt;/code&amp;gt; attribute.&lt;br /&gt;
&lt;br /&gt;
=== XMDP Profile ===&lt;br /&gt;
&amp;lt;pre&amp;gt;&amp;lt;nowiki&amp;gt;&amp;lt;dl class=&amp;quot;profile&amp;quot;&amp;gt;&lt;br /&gt;
 &amp;lt;dt id=&amp;quot;robots-nofollow&amp;quot;&amp;gt;robots-nofollow&amp;lt;/dt&amp;gt;&lt;br /&gt;
 &amp;lt;dd&amp;gt;&lt;br /&gt;
  Informs robots that links contained by the element are not to be followed.&lt;br /&gt;
 &amp;lt;/dd&amp;gt;&lt;br /&gt;
 &amp;lt;dt id=&amp;quot;robots-follow&amp;quot;&amp;gt;robots-follow&amp;lt;/dt&amp;gt;&lt;br /&gt;
 &amp;lt;dd&amp;gt;&lt;br /&gt;
  Informs robots that links contained by the element are to be followed.&lt;br /&gt;
 &amp;lt;/dd&amp;gt;&lt;br /&gt;
 &amp;lt;dt id=&amp;quot;robots-noindex&amp;quot;&amp;gt;robots-noindex&amp;lt;/dt&amp;gt;&lt;br /&gt;
 &amp;lt;dd&amp;gt;&lt;br /&gt;
  Informs robots that the content of the element is not to be included as part of the page.&lt;br /&gt;
 &amp;lt;/dd&amp;gt;&lt;br /&gt;
 &amp;lt;dt id=&amp;quot;robots-index&amp;quot;&amp;gt;robots-index&amp;lt;/dt&amp;gt;&lt;br /&gt;
 &amp;lt;dd&amp;gt;&lt;br /&gt;
  Informs robots that the content of the element is to be included as part of the page.&lt;br /&gt;
 &amp;lt;/dd&amp;gt;&lt;br /&gt;
 &amp;lt;dt id=&amp;quot;robots-noanchortext&amp;quot;&amp;gt;robots-noanchortext&amp;lt;/dt&amp;gt;&lt;br /&gt;
 &amp;lt;dd&amp;gt;&lt;br /&gt;
  Informs robots that the link target document is not to be indexed under the anchor text.&lt;br /&gt;
 &amp;lt;/dd&amp;gt;&lt;br /&gt;
 &amp;lt;dt id=&amp;quot;robots-anchortext&amp;quot;&amp;gt;robots-anchortext&amp;lt;/dt&amp;gt;&lt;br /&gt;
 &amp;lt;dd&amp;gt;&lt;br /&gt;
  Informs robots that the link target document is to be indexed under the anchor text.&lt;br /&gt;
 &amp;lt;/dd&amp;gt;&lt;br /&gt;
 &amp;lt;dt id=&amp;quot;robots-noarchive&amp;quot;&amp;gt;robots-noarchive&amp;lt;/dt&amp;gt;&lt;br /&gt;
 &amp;lt;dd&amp;gt;&lt;br /&gt;
  Informs caching robots that the content of the element is not to be included in their cached copy.&lt;br /&gt;
 &amp;lt;/dd&amp;gt;&lt;br /&gt;
 &amp;lt;dt id=&amp;quot;robots-archive&amp;quot;&amp;gt;robots-archive&amp;lt;/dt&amp;gt;&lt;br /&gt;
 &amp;lt;dd&amp;gt;&lt;br /&gt;
  Informs caching robots that the content of the element is to be included in their cached copy.&lt;br /&gt;
 &amp;lt;/dd&amp;gt;&lt;br /&gt;
&amp;lt;/dl&amp;gt;&amp;lt;/nowiki&amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Examples ==&lt;br /&gt;
Removing page content:&lt;br /&gt;
&amp;lt;pre&amp;gt;&amp;lt;nowiki&amp;gt;&lt;br /&gt;
&amp;lt;head profile=”http://example.org/xmdp/robots-profile#”&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;div class=”robots-noindex”&amp;gt;There once was a man from Nantucket…&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;This page is not about &amp;lt;span class=”robots-noindex”&amp;gt;pornography&amp;lt;/span&amp;gt;.&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Showing &amp;lt;code&amp;gt;nofollow&amp;lt;/code&amp;gt; in conjunction with [[votelinks]], and applying it in parallel with [[relnofollow]]:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&amp;lt;nowiki&amp;gt;&lt;br /&gt;
&amp;lt;head profile=”http://example.org/xmdp/robots-profile#”&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;p class=”robots-nofollow”&amp;gt;This is &amp;lt;a href=”http://example.com/bogus”&amp;gt;a bogus link&amp;lt;/a&amp;gt;&lt;br /&gt;
and so is &amp;lt;a href=”http://example.net/bogus”&amp;gt;this&amp;lt;/a&amp;gt;.&amp;lt;/p&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;p&amp;gt;I don't like &amp;lt;a rel=&amp;quot;nofollow&amp;quot; rev=&amp;quot;vote-against&amp;quot; class=&amp;quot;robots-nofollow&amp;quot;&lt;br /&gt;
                   href=&amp;quot;http://example.com/disagree&amp;quot;&amp;gt;this page&amp;lt;/a&amp;gt;&lt;br /&gt;
but I do like &amp;lt;a rev=&amp;quot;vote-for&amp;quot; href=&amp;quot;http://example.com/agree&amp;quot;&amp;gt;this one&amp;lt;/a&amp;gt;.&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Preventing images from being stored by search engines, forcing them to be retrieved from the originating website:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&amp;lt;nowiki&amp;gt;&lt;br /&gt;
&amp;lt;head profile=&amp;quot;http://example.org/xmdp/robots-profile#&amp;quot;&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;p&amp;gt;&amp;lt;img src=&amp;quot;example.png&amp;quot; class=&amp;quot;robots-noarchive&amp;quot; alt=&amp;quot;Private image&amp;quot; /&amp;gt;&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A consequence of this is that the small summaries that modern search engines display with the result links also exclude the &amp;lt;code&amp;gt;robots-noarchive&amp;lt;/code&amp;gt;.  We suggest replacing small excluded segments with an ellipsis [&amp;lt;code&amp;gt;...&amp;lt;/code&amp;gt;].  Unarchived segments of a size comparable to the segments the search engine normally uses for summaries can just be omitted.  Probably a display of an entire cached document which has unarchived segments should also include some locution to show the places where text has been elided, no matter what the size.&lt;br /&gt;
&lt;br /&gt;
A [http://peterjanes.ca/2005/robots/example more complex example] is available which also shows how the robots metadata may be [http://tantek.com/log/2005/06.html#d03t2359 visualized].&lt;br /&gt;
&lt;br /&gt;
== References ==&lt;br /&gt;
=== Normative ===&lt;br /&gt;
* [http://gmpg.org/xmdp/ XMDP]&lt;br /&gt;
* [http://www.robotstxt.org/wc/meta-user.html The Robots META Tag]&lt;br /&gt;
&lt;br /&gt;
=== Informative ===&lt;br /&gt;
* [http://www.robotstxt.org/wc/norobots.html A Standard for Robot Exclusion]&lt;br /&gt;
* [http://www.google.com/bot.html#noindextags Googlebot Frequently Asked Questions]&lt;br /&gt;
* [http://www.bauser.com/websnob/meta/robots.html The ROBOTS META Tag]&lt;br /&gt;
* [[relnofollow|RelNoFollow Draft Specification]]&lt;br /&gt;
* This page was contributed from the [http://developers.technorati.com/wiki/RobotsExclusion technorati developers' wiki].&lt;br /&gt;
&lt;br /&gt;
=== Thanks ===&lt;br /&gt;
* [http://tantek.com/log/ Tantek Çelik]&lt;br /&gt;
* [http://www.lachy.id.au/ Lachlan Hunt]&lt;br /&gt;
* [http://www.joesapt.net/ Joe D'Andrea]&lt;br /&gt;
&lt;br /&gt;
== Issues ==&lt;br /&gt;
These are open issues that have been raised in various forums.  The &amp;quot;efficacy&amp;quot; and &amp;quot;collateral damage&amp;quot; issues from [[relnofollow#open_issues|rel=&amp;quot;nofollow&amp;quot;]] also apply.&lt;br /&gt;
&lt;br /&gt;
=== Precedence ===&lt;br /&gt;
* Should earlier values take precedence or later?  Does &amp;lt;code&amp;gt;class=&amp;quot;robots-nofollow robots-follow&amp;quot;&amp;lt;/code&amp;gt; means the same as &amp;lt;code&amp;gt;class=&amp;quot;robots-nofollow&amp;quot;&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;class=&amp;quot;robots-follow&amp;quot;&amp;lt;/code&amp;gt;?&lt;br /&gt;
* &amp;lt;code&amp;gt;meta&amp;lt;/code&amp;gt; tag suggests not using conflicting or repeating directives and so does not specify precedence.  &amp;lt;code&amp;gt;&amp;amp;lt;p class=&amp;quot;robots-noindex robot1-index&amp;quot;&amp;amp;gt;&amp;lt;/code&amp;gt; is an apparent conflict but in this case the more specific should obviously override the general at its point of applicability, no matter what order the directives appear in.&lt;br /&gt;
* Interaction with [[relnofollow]]: what does &amp;lt;code&amp;gt;class=&amp;quot;robots-follow&amp;quot; rel=&amp;quot;nofollow&amp;quot;&amp;lt;/code&amp;gt; mean?  Currently [[relnofollow]] has no profile URI defined, so the Robot Exclusion Profile takes precedence.  In the future, per XMDP's [http://gmpg.org/xmdp/description#multiple Using Multiple Profiles], &amp;lt;q&amp;gt;the URIs in the 'profile' attribute are to be treated most significant (first) to least significant (last).&amp;lt;/q&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Phrases ===&lt;br /&gt;
&lt;br /&gt;
Modern search engines normally support &amp;lt;i&amp;gt;phrase&amp;lt;/i&amp;gt; queries.  A phrase query only maches documents that contain the words of the query, consecutively and in the same order.  That does beg the question of whether a matched phrase should be allowed to straddle a &amp;lt;code&amp;gt;class=&amp;quot;robots-noindex&amp;quot;&amp;lt;/code&amp;gt; region.&lt;br /&gt;
&lt;br /&gt;
Intuitively this should not be allowed.  The phrase query &amp;lt;code&amp;gt;&amp;quot;word1 word2&amp;quot;&amp;lt;/code&amp;gt; should not match a document that contains &amp;lt;code&amp;gt;word1 &amp;amp;lt;b class=&amp;quot;robots-noindex&amp;amp;gt;ignore&amp;amp;lt;/b&amp;amp;gt; word2&amp;lt;/code&amp;gt;.  This does allow for an interesting tool for webmasters can specify that juxtaposed words not be considered to be phrases -- just specify an empty unindexed region as in &amp;lt;code&amp;gt;word1 &amp;amp;lt;b class=&amp;quot;robots-noindex&amp;amp;gt;&amp;amp;lt;/b&amp;amp;gt; word2&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
=== Specificity ===&lt;br /&gt;
* Does not allow control of specific UAs à la [http://www.robotstxt.org/wc/norobots.html A Standard for Robot Exclusion]&lt;br /&gt;
&lt;br /&gt;
If it is actually necessary to control specific UAs here is an possible soluiton.&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&amp;lt;nowiki&amp;gt;&lt;br /&gt;
&amp;lt;!DOCTYPE html PUBLIC &amp;quot;-//W3C//DTD XHTML 1.0 Strict//EN&amp;quot; &amp;quot;http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;html&amp;gt;&lt;br /&gt;
&amp;lt;head&amp;gt;&lt;br /&gt;
&amp;lt;link rel=&amp;quot;schema.RobotExclusion&amp;quot; href=&amp;quot;http://example.org/.../&amp;quot; /&amp;gt;&lt;br /&gt;
&amp;lt;meta name=&amp;quot;RobotExclusion.RobotName1&amp;quot; content=&amp;quot;Foo Bot&amp;quot; /&amp;gt;&lt;br /&gt;
&amp;lt;meta name=&amp;quot;RobotExclusion.RobotName2&amp;quot; content=&amp;quot;Bar Bot&amp;quot; /&amp;gt;&lt;br /&gt;
&amp;lt;meta name=&amp;quot;RobotExclusion.RobotName3&amp;quot; content=&amp;quot;Evil Bot&amp;quot; /&amp;gt;&lt;br /&gt;
&amp;lt;/head&amp;gt;&lt;br /&gt;
&amp;lt;body&amp;gt;&lt;br /&gt;
&amp;lt;h1&amp;gt;Page&amp;lt;/h1&amp;gt;&lt;br /&gt;
&amp;lt;p class=&amp;quot;robots-noindex&amp;quot;&amp;gt;This paragraph shouldn't be indexed by any bot.&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;p class=&amp;quot;robot3-noindex&amp;quot;&amp;gt;This paragraph should be indexed by every bot except &amp;quot;Evil Bot&amp;quot;.&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;p class=&amp;quot;robots-noindex robot1-index&amp;quot;&amp;gt;This paragraph should only be indexed by &amp;quot;Foo Bot&amp;quot;.&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/body&amp;gt;&lt;br /&gt;
&amp;lt;/html&amp;gt;&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
Of course it is a waste of bandwith if there are &amp;quot;RobotExclusion.RobotName&amp;quot; meta tags&lt;br /&gt;
on every page of a website. Thus this metatags should be stored on one page - perhaps the&lt;br /&gt;
main page - so they can be maintained easily. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&amp;lt;nowiki&amp;gt;&lt;br /&gt;
&amp;lt;!DOCTYPE html PUBLIC &amp;quot;-//W3C//DTD XHTML 1.0 Strict//EN&amp;quot; &amp;quot;http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;html&amp;gt;&lt;br /&gt;
&amp;lt;head&amp;gt;&lt;br /&gt;
&amp;lt;link rel=&amp;quot;schema.RobotExclusion&amp;quot; href=&amp;quot;http://example.org/.../&amp;quot; /&amp;gt;&lt;br /&gt;
&amp;lt;link rel=&amp;quot;RobotExclusion.Names&amp;quot; href=&amp;quot;http://mypage.com/&amp;quot; /&amp;gt;&lt;br /&gt;
&amp;lt;/head&amp;gt;&lt;br /&gt;
&amp;lt;body&amp;gt;&lt;br /&gt;
&amp;lt;h1&amp;gt;Page&amp;lt;/h1&amp;gt;&lt;br /&gt;
&amp;lt;p class=&amp;quot;robots-noindex&amp;quot;&amp;gt;This paragraph shouldn't be indexed by any bot.&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;p class=&amp;quot;robot3-noindex&amp;quot;&amp;gt;This paragraph should be indexed by every bot except &amp;quot;Evil Bot&amp;quot;.&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;p class=&amp;quot;robots-noindex robot1-index&amp;quot;&amp;gt;This paragraph should only be indexed by &amp;quot;Foo Bot&amp;quot;.&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/body&amp;gt;&lt;br /&gt;
&amp;lt;/html&amp;gt;&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Keywords ===&lt;br /&gt;
* The keywords &amp;lt;code&amp;gt;all&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;none&amp;lt;/code&amp;gt; are defined by the Robots META Tag as convenience shortcuts to enable or disable the combination of &amp;lt;code&amp;gt;nofollow&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;noindex&amp;lt;/code&amp;gt;, but predate Google's &amp;lt;code&amp;gt;noarchive&amp;lt;/code&amp;gt; and should not be considered to include it.  As a result, for purposes of clarity and simplicity (the [http://gmpg.org/xmdp/description#principles XMDP Minimalism principle]), they are not included in this version of the Robot Exclusion Profile.&lt;br /&gt;
&lt;br /&gt;
=== Suitability as a microformat ===&lt;br /&gt;
* Isn't the Robot Exclusion Profile designed for machines first and humans second instead of vice versa?  Yes, just as much as [[relnofollow]], the deployed microformat that it's designed to replace.&lt;br /&gt;
* I'd like to echo this concern. We need to discuss whether or not this is a suitable microformat. --[[User:RyanKing|RyanKing]] 13:34, 17 Jan 2006 (PST)&lt;br /&gt;
&lt;br /&gt;
=== Extension ===&lt;br /&gt;
* As I read this, I had the idea to use this microformat to differentiate the real content of a webpage from the rest (navigation, header, footer, ...) - you could do this by marking the &amp;quot;real content&amp;quot; with the tag &amp;quot;index&amp;quot;, but thats not really clear. Maybe you could create a new tag to mark the really important things on the page (the &amp;quot;real content&amp;quot;) from the rest. --[[User:Habakuk|Habakuk]] 03:42, 14 Jan 2007 (PST)&lt;br /&gt;
* And another idea is to mark an area of a page as independent from the rest (p.e. for listings of softwaretools - if i search for an software that can do ''a'' and ''b'' i don't want to get a result that offers me a software that can do ''a'' and another that can do ''b''). --[[User:Habakuk|Habakuk]] 03:42, 14 Jan 2007 (PST)&lt;/div&gt;</summary>
		<author><name>Habakuk</name></author>
	</entry>
</feed>