tagcloud-brainstorming

This is for brainstorming ideas around tagcloud formats

First pass analysis

From a visual and logical perspective, tagclouds have a reasonably small number of common components, and largely all focus on the same problem. They are typically

an alphabetically ordered list of links to a tag space — occasionally the order is by popularity.
the links are usually single words

While it is possible to imagine other ways of representing tags, like most recent, using a tag cloud, in the examples considered all show popularity, albeit over different time scales. Typically the times scales are

most commonly, all time popularity
less frequently popularity over the last week and
popularity ver the last 24 hours

On the ground, things become more complicated.

the root elements

Typically, but not always, there is a root element, with a class or id value. Root elements include the following elements

p
div
ul
td

And are given the following class and/or id values

class=”heatmap” id=”smallheatmap”
class=”heatmap” id=”bigheatmap”
id=”TagCloud”
id=recently
class=”frontpageheatmap”
class=”alphacloud”
class=”freqcloud”
class=”zoomclouds”
id=”tagcloud24Display” class=”tagcloud”
id=”tagcloudWeekDisplay” class=”tagcloud”
id=”tagcloudDisplay” class=”tagcloud”
class=”hTagcloud”

Clearly there is some consensus. Cloud is the most common part of the identifying values, with tagcloud in whole or part reasonably common. But, do heatmaps constitute a specific subset of tagclouds? Do we really have two kinds of seemingly similar entities — tagclouds and heatmaps? Or are all tagclouds heatmaps? A class or id name for the root element of a tagcloud is required. We are proposing hTagcloud, as the general term for these entities is “tagcloud” and by analogy with hCard etc. Class rather than id would seem to make the most sense, as often more than one tagcloud appears on a page.

The tags themselves

Some of the clouds are marked up as lists:

technorati
webconnections

Some of them are marked up as links without any other intervening markup

flickrs tagcloud (but not heat maps!)
BBPress
del.icio.us
squidoo

Zoomclouds wrap the links in a spam with a class value.

Marking up “weight”

Probably the trickiest issue is how popularity “weight” is marked up. Several sites use inline CSS, with font-size values including

flickr
BBPress
squidoo

This is hardly to be considered semantic markup. Other sites use class values

del.icio.us
zoomclouds
web connections

Probably the most immediately obvious mechanism for marking up tag weights. Class is used in ways similar to this all the time. But we should perhaps be less hasty than this. What exactly is class for? Our old friend the HTML 4.01 spec says of class

The class attribute has several roles in HTML: … For general purpose processing by user agents.

Does giving an tag a class value to represent its popularity constitute using class for “general purpose processing”? The definition is sufficiently vague as to seemingly preclude nothing that would loosely be associated with data processing. But it might be suggested that class is for element identification (the class attribute definition is actually found in the specification subsection titled “Element identifiers”), not for containing actual data, which arguably relative popularity is. Perhaps one way of addressing this is by semantically naming the class values in a relative way, for example “popular”, “v-popular”, “vv-popular”. In this context, tags would “belong to a class” based on popularity. But, it’s not the tag which belongs to a class, rather the element which has the tag as its value which is assigned a class in this way. You might be able to see why I suggested that the obvious use of class might be a bit hasty. Two unique examples stand out. In addition to using inline CSS, BBPress also adds a title value relative to weight. The more popular a tag, the higher the value of the title value. While this may seem perverse, it’s at least arguably correct. The HTML 4.01 spec says of title This attribute offers advisory information about the element for which it is set. As to whether the popularity of a tag constitutes “advisory information” is a matter for discussion. Technorati uses nested em elements to indicate weight. This is very clever, IMO, but I suspect, and a quick straw poll with some pretty savvy developers suggests, that it is perhaps not overly humanly friendly at least from a publisher’s point of view. Precisely how best to markup the weight of a tag would seem to me to be the outstanding issue to resolve in developing a tagcloud microformat.

Next steps

Ok, looking over our microformats process checklist: we’ve seen there is a problem to solve, and we’ve done some research into the curent ways in which the problem is being solved. What’s next? Microformats.org has this to say about creating a new microformat

DON’T!!! There are other things to try before developing a microformat. First, ask yourself these questions:

Is there a standard element in XHTML that would work?
Is there a compound of XHTML elements that would work?
Ok, if the answer to the above two is ‘no,’ we can talk about a microformat.

So

1. Is there a standard XHTML element which would work?
I really don’t think so.

2. Is there a compound of XHTML elements that would work?
I think that we can get a fair way to solving this problem by using a number of standard HTML components. In essence, a tagcloud is just a list of links. It’s ordered usually alphabetically, but in the case of del.icio.us, by frequency as well. Often too, the list is labeled with a heading, and some kind of explanation.

Issues to resolve

A number of issues emerge even before we get to the more thorny ones outlined above. Should a tagcloud microformat mandate the use of lists, or any type of particular element? Typically, microformats focus on the use of class values, and other attribute values. However, some focus in element types too — for example XOXO. Should the issue of different types of tagcloud (all time popularity, time scoped popularity) as found in these real world examples be accommodated in a hTagcloud? The web connections tagcloud markup deals with this issue like this

<div class="hTagcloud">
	<ul class="popularity">
		<li class="weight1"><a href="/tags/Access+Testing">Access Testing</a></li>
		<li class="weight1"><a href="/tags/McFarlane+Prize">McFarlane Prize</a></li>

This would conform with microformats patterns, where the root element has a single identifying class or id value. Squidoo on the other hand, marks this up by using both an id and class value on its “root” tagcloud element

<div id="tagcloud24Display" class="tagcloud">

So, again, we can find in these real world examples that there is a need to differentiate different kinds of tagcloud. The web connections approach uses the semantic appropriateness of the list, coupled with a containing root div element to provide a mechanism for doing this conformant with current microformats patterns. The really significant issue, as outline above is just how to correctly use the mechanisms for HTML to markup the weight of fonts?

Toward a proposal for hTagcloud

OK, I’ve followed the process for considering whether a tagcloud microformat makes sense. i think given the widespread use of this pattern, at some significant sites and in some significant applications, that at least the proposal of this microformat makes sense. Where do we go now? Are there any well established, interoperable implemented standards we can look at which address this problem?. To the best of my knowledge, no. Next the proposal procedure asks us to ensure that it isdesigned for humans first and machines second. Let’s keep that in mind with the following .01 draft proposal. In conjunction with this, the process asks

If I looked at this microformat in a browser that didn’t support CSS or had CSS turned off, would it still be human-readable?
Are this format’s elements stylable with CSS?

We’ll address these in a moment as well.

.01 hTagcloud proposal

Based on the above discussion, here is a very first stab at an hTagcloud microformat proposal. Some of it is contingent on the resolution of the issues outlined above and summarized below.

hTagclouds have a root element with a class value of hTagcloud
this root element contains a list element, with an optional class value which identifies the nature of the tagcloud — is it historically popular? is it popularity within the last 24hours, is it popularity within the last 7 days. These are the three common kinds of popularity in the real world examples shown. Should other reasonably common kinds of cloud be found in the wild, these can be added to the list
This list may also optionally have a class value to indicate the order — alphabetical or by frequency (is this really required?)
tags are link elements, with an href value of the tagspace which the tagcloud represents
popularity or “weight” is conveyed with class values. There are 5 class values, ranging from “popular” to “vvvv-popular”. Some tag clouds have many more levels than this, but around 5 is a common number of weights. Many more than 5 becomes difficylt to convey meaningfully via style. “Popular” is the “lowest” value, because in any non trivial tagging system, the number of tags in the system vastly exceeds the tags displayed in the tagcloud. All tags in a tagcloud are at least popular. That’s why the values don’t start with vv-unpopular, and range to vv-popular, by analogy with CSS named font sizes. As per the discussion on class values for marking up tag weight above, popularity has been chosen above terms like “weight” so that the tags belong to a class based on popularity, rather than using class to carry with it more data about the content of the element. The use of class in this way is familar to a significant number of developers, and makes for easily stylable tagclouds.
The use of the “title” has been excluded for the following reasons
1. it’s an atypical use of title
2. elements marked up with title could only be styled using CSS with attribute selectors, which are both largely unused by developers, and not supported in the majority of browsers people use (in the sense that the majority of web users are using a browser which does not support this selector, therefore in effect these elements aren’t stylable at present with CSS, and so developers would be unlikely to adopt this practice for this practical reason)
The use of nested ems has been not adopted for this draft because of its novelty, and the fussiness of coding it could quite possibly preclude the adoption of hTagcloud. This is based on a straw poll of some very proficient developers. While it is a very clever use of HTML, it might also be argued that the existence of the em element for Indicat{ing} emphasis and additionally the strong element for Indicat{ing} stronger emphasis suggests that the use of multiply nested em elements does not indicate greater emphasis than a single unnested em element. Otherwise strong would be redundant, as equivalent to <em><em>

Of course, this is a .01 specification, and has been explicitly drafted to put the issues on the table for discussion at this early stage, while also moving the proposal forward by making concrete suggestions.

an example hTagcloud

<div class="hTagcloud">
	<ul class="popularity">
		<li class="vvvv-popular"><a href="/tags/Web+Standards+Group">Web Standards Group</a></li>
		<li class="vvv-popular"><a href="/tags/accessibility">accessibility</a></li>
		<li class="popular"><a href="/tags/beta+tester">beta tester</a></li>
		<li class="vvv-popular"><a href="/tags/css">css</a></li>
		<li class="v-popular"><a href="/tags/ex-coder">ex-coder</a></li>
		<li class="vv-popular"><a href="/tags/usability">usability</a></li>
		<li class="vvvv-popular"><a href="/tags/wsg">wsg</a></li>
 	</ul>
</div>