Blog Archive for January, 2008

Building open textual content on HTML

The Web is by far the most successful medium in history for the open publishing and sharing of content. Focusing efforts to promote and enable open content on the Web first and foremost (rather than say, proprietary data warehouses and corporate databases) thus has the greatest enabling effect for open content in general.

Textual content on the Web is dominated by HTML (including XHTML of course) due to its broad reach and ease of authorship. The more we are able to use HTML as the common carrier of higher fidelity chunks of information, the more we empower and enrich the publishing and sharing of textual content.

Thus microformats are developed in line with “plain old semantic HTML” () practices and principles, that is, as valid semantic extensions to HTML. Semantic HTML by itself enables sharing open content with headings, paragraphs, and lists, etc. Microformats build upon that foundation, rather than reinventing (i.e. reuses HTML for lists and nested lists for outlines, rather than inventing new tags or vocabulary), and extending only for commonly published semantics beyond HTML, such as , , , , etc.

These extensions can be used to publish documents containing just one type of information for consumption by domain-specific applications (e.g. a contact list for address books, or an event list for calendaring tools), or many types intermixed and nested, embedded in a larger document that ties them all together with meaningful context such as a resume, meaning that would be lost were each type of data isolated, removed from its context, and published in its own special-purpose format silo.

Whether simple collections, or compound documents, by building on HTML, all such uses work well not only on their own, but embedded and mixed with existing web content, in a way well understood by web authors, browsers and search engines alike, in stark contrast to . Finally, it is this broader reach, to existing content, authors, applications, search services, and a variety of devices, that makes textual content built on HTML even more open from a practical perspective.

Open content depends on open standards

Creative Commons (CC) pioneered broad awareness of the need and value of open content publishing and sharing. By providing a set of licenses that let authors clearly choose how and under what conditions to make their content freely available, CC also made it easier.

Open content is dependent on the formats used to publish it for how “open” it truly is. Open content published in a proprietary format supported only by a single-vendor proprietary application is only as open as that single-vendor chooses to make it. E.g. open content authored in and published in its default is not “open” to Macintosh users (even converters have problems). Such open content is essentially held hostage by the sole application (and the sole platform family that it runs on) that supports that format. In addition, if the sole vendor in this case chooses to stop supporting that sole application, then open content in that format becomes dead content. More on that in a future post on “data longevity“.

Content is most easily, reliably, and broadly shared when the formats used by such content are as open as possible. Truly open formats encourage the maximum amount of documentation (from syndicated blog posts to professionally published books), and interoperable implementations (from open source to proprietary for-profit) of such formats. I encourage everyone developing open standards to make them as open as possible, by taking the same steps we have taken with microformats, and thus better enabling open content, and the thereof.