[uf-discuss] multi-word tags

Andy Mabbett andy at pigsonthewing.org.uk
Tue Sep 25 06:12:01 PDT 2007


The rel-tag spec says:

        Spaces can be encoded either as + or %20.

(I think that "can" ought to be an RFC2119 "SHOULD", BTW; other specs
may similarly need to have RFC2119 applied more rigorously).

However, I see sites, in the wild, using dashes (hyphens) and others
which encode spaces as underscores (e.g. del.icio.us; Wikipedia and
other MediaWiki sites). See previously-compiled evidence, at:

        <http://microformats.org/wiki/rel-tag-spaces>

Indeed, pages on the microformats wiki use hyphens in URLs which would
seem suitable for use as tag spaces:

        <a
        rel="tag"
        href="http://microformats.org/wiki/existing-rel-values"
        >
        existing rel values
        </a>

Operator, for example, regards:

        West+Midland+Bird+Club

        West-Midland-Bird-Club

and:
        West_Midland_Bird_Club

as three distinct tags, and does not discard them as duplicates (see
test page at <http://www.westmidlandbirdclub.com/tag-test.htm> ).


What do other parsers and implementations do? Should the spec be
altered, so that the above, and:

        West%20Midland%20Bird%20Club

are all deemed equal?

Likewise, for that matter:

        West+Midland-Bird_Club


This might be achieved by saying that spaces in tags SHOULD be encoded
as encoded as either + or %20, but that parsers MUST treat dashes and
underscores as spaces.

Or we could simply say that spaces in tags SHOULD be encoded as encoded
as +, %20, - or _

-- 
Andy Mabbett


More information about the microformats-discuss mailing list