Month: January 2006

Google releases Web Authoring Statistics

Google has recently released a report on Web Authoring Statistics.

The report, which used over 1 billion documents as its input, analyzes the relative frequency of various HTML elements and attributes. They also mention as another initiative which is analyzing markup trends on the web.

The study is worth a read for anyone interested in semantic markup and especially microformats. Beware, however, that in order to see the graphs, you’ll need a browser which can properly render SVG content (FireFox 1.5 seems to work pretty well here).


Microformats are semantic markup

Microformats, while a relatively young movement, are an outgrowth of a movement that’s been going on for quite a bit longer. For years now, web developers and designers have been abandoning purely presentational markup in favor of structural markup. As you move away from presentational markup and towards cleaner, more meaningful markup, you open up the possiblity of having not only human semantics, but also machine semantics, which though they are two different things, can often be very similar (see the wikipedia for more on this difference).

Anyway, microformats make the most sense in this context, where we assume that web developers are somewhat concerned with semantic markup and have already gone through the steps of making their markup meaningful.

I’m thinking about this because of a discussion Chris and I had yesterday in the microformats irc channel, which Chris has already blogged about.

In essence, Chris was asking for a microformat for a use-case that doesn’t quite exist yet (at least for a majority of Web users). Of course, Chris is just gonna go off and invent that use case, which is great, but just not a case for a new microformat. Part of the idea behind microformats is to standardize and codify emergent, popular behavior on the Web. If some useage of the web is too nascent to have converged, we can’t easily codify it, so we choose to pass on the problem.

However, just because Chris’s use case wasn’t appropriate for developing a new microformat, doesn’t mean he can’t use preexisting microformats and his own idiomatic semantic markup. In fact, he really should, because if his project catches on, then we may want to create a new microformat in the future, at which time his work will be great prior art.

Of course, the appropriate cliché here, would be “paving the cowpaths.


Tim Bray on creating XML Dialects

Tim Bray has a thorough essay on the pros and cons (mostly cons) of inventing new XML dialects.

Tim starts by saying…

Designing XML Languages is hard. It’s boring, political, time-consuming, unglamorous, irritating work. It always takes longer than you think it will, and when you’re finished, there’s always this feeling that you could have done more or should have done less or got some detail essentially wrong.

…. which pretty well sums up the challenges with creating new document formats for the Web. Of course, we try to eliminate some of these drawbacks when doing microformats- mostly be focusing on existing behaviors on the web and aiming for the 80% use case (rather than trying to satisfy every edge case), or in Tim’s words, “do[ing] less.”

As Tim went on to describe the challenges and pitfalls of creating arbitrary XML dialects, I was already preparing a “Just use microformats!” response in my head. But, alas, Tim beat me to the punch.

Along with DocBook, ODF, UBL and Atom, he recommends “XHTML+Microformats” as a way to reuse an existing XML dialect, and thereby bypass some of the birth pains of creating a new format. Tim says:

If you’re delivering information to humans over the Web, even if you don’t think of it as “Web Pages”, it’s almost certainly insane not to use XHTML. Yes, XHTML is semantically weak and doesn’t really grok hierarchy and has a bunch of other problems.

Thanks, Tim, for the endorsement of Microformats here.

Of course, the fact that the language is semantically weak, doesn’t seem like that big a deal to me, since we can build on top of the semantics it does have (instead of reinventing things like lists, links and paragraphs). And for hierarchies of things, you can always use .

Creating new XML languages is a hard task and not likely to be rewarding. We don’t need more arbitrary formats, each with their own namespace and slightly different semantics.

2005, Year in Review

Just when you thought you wouldn’t have to read another “year in review” blog post…

2005 was an incredible year for the growth of microformats, in terms of specification, implementation, and overall awareness. The community has produced some incredible results in just over six months of existence.

Simple microformats have found their way into several major search engines. Rel-license is indexed by both Yahoo and Google to help find content based on the page’s copyright giving another orthogonal key to search on. Vote-links showed their importance at the end of 2004 during the elections, and are indexed by Technorati to determine whether links from blogs are endorsements or not. Rel-tag and rel-directory are other simple microformats that have contributed to the building and indexing of folksonomies. XFN is now just over a two years old, and 2005 saw the emergence of the second (in addition to XFN indexer and search engine: XFN has seen a proliferation of uses throughout the web in 2005. Other simple microformats have been proposed, including one to determine when the last time a page was modified.

As for compound microformats, there have been three big ones that have been documented and have seen success. These are hCard, hCalendar, hReview. hReview is used to create reviews of movies, books, restaurants and many other things. indexes and aggregates hReviews and Yahoo UK uses them for their movie reviews. hCard is based on the vCard spec and has seen explosive growth this past year. Bloggers have used hCard to mark-up their contact information, but even more main-stream, Universities have marked-up their directories with hCards, Avon edited a single template and over 40,000 of their representative’s contact information is now easily machine readable. has published over 100,000 venues with hCards and even more events with hCalendar. hCalendar is a representation of iCalendar and allows for events to easily be extracted and imported into most calendaring programs. As bloggers talk about events and encode them in hCalendar, it allows events to be searched and aggregated across the entire web, as well as opening an RSS reader for news, today you might open an hCalendar reader to gather events. Eventful isn’t the only place to find hCalendar content,, Laughing Squid and others all contribute to building a distributed calendar.

In 2005 several more compound microformats have begun, including hAtom and xFolk. hAtom allows you to encode a feed into your (X)HTML, so it is one and the same thing. xFolk is an open social bookmarking standard that would make it possible to easily collect social bookmark data and remix it to invent new services. Research is proceeding on a resume format, a citation microformat to describe publications, references, bibliographies, and a listing microformat to describe items for sale, for rent, or items people would like to buy.

As more and more companies add basic information about their business, search engines will be able to truly search based on more specific criteria such as zip code. Right now you search for “Pizza 63101” and that will return all search results that contain the term “pizza” in the “63101” zip code. Now with microformats you could limit the term “63101” to ONLY the postal-code property and “pizza” only to the FN,N,CATEGORY, or ORG property (that would stop all the buildings on “Pizza Street” from appearing in the results). Next, it would be possible to further restrict the restaurants to only those with associated hReviews of 3.5 stars or higher. Finally, if the site has encoded any information with hCalendar, you could determine their opening/closing hours any special deals and offers for a given day.

2005 has laid the groundwork for all of this to begin, as a community we should be proud of what we have done, and excited about where it is going. As microformats grow, 2006 and beyond look very exciting!