robots-exclusion-brainstorming
Revision as of 19:33, 13 November 2007 by Tantek (talk | contribs) (drafted, moved a proposal from robots-exclusion-issues to here.)
robots exclusion brainstorming
This page contains brainstorming, thoughts, and proposals for extending the robots-exclusion microformat.
specific user agents
robots-exclusion lacks the ability to allow control of specific UAs à la A Standard for Robot Exclusion. While this is out of scope currently since meta robots has no specific control of UAs either, here are some thoughts that have been proposed:
If it is actually necessary to control specific UAs here is an possible soluiton. Example:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html> <head> <link rel="schema.RobotExclusion" href="http://example.org/.../" /> <meta name="RobotExclusion.RobotName1" content="Foo Bot" /> <meta name="RobotExclusion.RobotName2" content="Bar Bot" /> <meta name="RobotExclusion.RobotName3" content="Evil Bot" /> </head> <body> <h1>Page</h1> <p class="robots-noindex">This paragraph shouldn't be indexed by any bot.</p> <p class="robot3-noindex">This paragraph should be indexed by every bot except "Evil Bot".</p> <p class="robots-noindex robot1-index">This paragraph should only be indexed by "Foo Bot".</p> </div> </body> </html>
Of course it is a waste of bandwith if there are "RobotExclusion.RobotName" meta tags on every page of a website. Thus this metatags should be stored on one page - perhaps the main page - so they can be maintained easily.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html> <head> <link rel="schema.RobotExclusion" href="http://example.org/.../" /> <link rel="RobotExclusion.Names" href="http://mypage.com/" /> </head> <body> <h1>Page</h1> <p class="robots-noindex">This paragraph shouldn't be indexed by any bot.</p> <p class="robot3-noindex">This paragraph should be indexed by every bot except "Evil Bot".</p> <p class="robots-noindex robot1-index">This paragraph should only be indexed by "Foo Bot".</p> </div> </body> </html>
problems
- use of the
meta
element violates the visibility principle of microformats. - encoding content such as "robot1" or "robot3" into class values violates the principle of not putting content into the class attribute.
poll
- -1 Tantek due to the above problems, and the fact that we have yet to see much adoption of robots-exclusion in the first place, I think the proposal is both premature and flawed.
Habakuk extensions
- As I read this, I had the idea to use this microformat to differentiate the real content of a webpage from the rest (navigation, header, footer, ...) - you could do this by marking the "real content" with the tag "index", but thats not really clear. Maybe you could create a new tag to mark the really important things on the page (the "real content") from the rest. --Habakuk 03:42, 14 Jan 2007 (PST)
- And another idea is to mark an area of a page as independent from the rest (p.e. for listings of softwaretools - if i search for an software that can do a and b i don't want to get a result that offers me a software that can do a and another that can do b). --Habakuk 03:42, 14 Jan 2007 (PST)