kragen-history-of-markup

From Microformats Wiki
Jump to navigation Jump to search

This page is still skeletal. Perhaps some of it should move to a "history" or "background" page?

SGML

Starting in 1969 and throughout the 1970s, Charles Goldfarb, Edward Mosher, Raymond Lorie, and others created GML, the Generalized Markup Language, at IBM. It was a meta-language for domain-specific document formats with semantic information, which could be used for flexible stylesheet-based formatting (somewhat like other systems around the same time, such as Don Knuth's TEX and Brian Reid's Scribe) as well as more database-like applications, such as composing tables of contents, or searching a corpus of case law by its citations.

Throughout the 1980s, GML developed into SGML under the auspices of ANSI, and was widely used in document-intensive businesses. But SGML's complexity, and the necessity to write a DTD specific to your application area, inhibited its wider use.

HTML

HTML began in 1992 as one particular SGML DTD, used for marking up hypertext documents including headers, lists, and a very few other things. Over the next several years, wide (if ad-hoc) user-agent support led HTML documents to outnumber other SGML documents by a very wide margin. Unfortunately, since HTML is only an application of SGML, it lacked SGML's generality; it could represent the semantics of a document at the level of tables, lists, emphasized words, and headings, but lacked elements such as "part number", "statute", "copyright date", "variable name", "FAQ question", and "method name", which are useful for particular kinds of documents; authors rendered these as more generic elements such as "list item", "paragraph", and "italics".

HTML3 was a 1994 attempt to remedy this situation by adding many more semantic elements to the language, including semantic markup for navigational banners, quotations, persons, variable names, figure captions, mathematical formulae, and admonishments, in addition to most of the parts of HTML we use today. In particular, an attribute called CLASS was added (am I sure this didn't happen earlier?) to most elements in order to specify document-specific semantic classes for the application of stylesheets.

Most web-browser authors rejected the absurd excesses of HTML3.

The stylesheet culture of SGML had not penetrated the HTML culture; until Microsoft Internet Explorer 4, web browsers didn't support any stylesheet language at all (reference?) and consequently the separation between structure and semantics that pervaded SGML applications was impossible in HTML. Authors had to map the structure of the document into the impoverished set of elements provided by HTML, and had to explicitly specify the presentation of each element in the document markup, with tags such as FONT, BR, and TABLE.

XML

The WC established a "W3C SGML Working Group" in 1996 in order to produce a simplified subset of SGML that would allow web-page authors to invent their own sets of tags, which web browsers would render according to stylesheets. In 1998, this effort resulted in XML 1.0. XML was not adopted as a format for web pages because web browsers didn't support either XML or any stylesheet languages at the time.

However, XML proved very popular among users of SGML, and indeed extended SGML-like applications into areas where SGML had previously been rejected as too complex. XML's uncountable applications today include data structure serialization (in XML-RPC and SOAP), feed syndication (as Atom and RSS), printer description files, MIME-type descriptions, technical documentation, IM buddy lists, web-browser bookmarks files, GUI descriptions (in XUL, Glade, and XAML), metadata for software components (for example, in Chandler), system configuration data (in GConf and MacOS property lists), software build instructions (in Ant and CodeWarrior), spreadsheets (in Gnumeric and OpenOffice).

In the wake of XML's failure to replace HTML for web pages, in 2000, the W3C promulgated a new version of HTML, known as XHTML. XHTML was an application of XML rather than of SGML.

The battle for the soul of the Web

Throughout most of the early years of the web, a constant battle raged between the advocates of semantic markup and the advocates of presentational markup.

The presentational-markup people did whatever worked to get their documents to look right in their browser of choice, right down to the last pixel. They achieved stunning visual effects --- sometimes beautiful, sometimes hideous. Their documents inevitably used browser-specific tags to achieve specific visual effects, looked wrong if the user's browser window was an unexpectedly small size ("Best viewed at 800x600 or better!"), and often looked wrong in subsequent releases of the same browser.

The semantic-markup folks pointed out that semantic markup didn't require a version for each browser, was usually more readable, shrank page downloads, supported "graceful degradation" for devices such as Braille terminals, cellphone browsers, and text-to-speech systems. They advocated validating HTML documents against SGML DTDs, using on-line services such as the W3C validator and its predecessor at Webtechs.

The presentational-markup people responded that the web pages maintained by the semantic-markup folks looked the same way in every browser, but that just meant they looked boring in every browser, and that wasn't what the clients of the presentational-markup people were asking for.

The "browser wars" were a large contributor to this problem. Netscape, and later Microsoft, worked as hard as they could to establish ways in which their browser was better than other available browsers, and generally this took the form of new HTML tags to produce "cool" visual effects, such as BLINK, MARQUEE, LAYER, FONT, and IFRAME. Consequently the presentational-markup people continued to have better-looking web pages, although they cost considerably more to maintain.

In 1996, the W3C promulgated a stylesheet language for specifying visual presentation of XML and HTML documents: CSS, level 1. (Apparently they didn't think existing SGML stylesheet languages such as DSSSL were good enough.) In 1998, they published the first release of another stylesheet language, XSL. Also in 1998 (???), Netscape released their current browser as free software, persuaded by Eric Raymond's arguments that certain "open" development processes produced better software. (Raymond immediately founded the Open Source Initiative to promulgate the new term "open source" as a "marketing program" for free software, on the basis of producing better software, rather than protecting users' freedoms.)

Internet Explorer 4 shipped (in 1999?) with limited support for the CSS stylesheet language. In 2000, Internet Explorer 5 for the Macintosh became the first browser with full support for CSS1. Over the next several years, Netscape's browser was rewritten nearly from scratch, with "standards support" (rather than "competitive advantage") as a major goal, because of the influence from the open-source community, resulting in Mozilla Seamonkey and then Firefox.

By late 2001, stylesheet support was good enough in both Internet Explorer and Mozilla that web designers could, for the first time, make a semantic-markup site look good. The old disadvantages of presentational markup remained, but its visual advantages gradually melted away, as did its proponents. (The market crash also helped, by driving many of them out of the web-design business entirely, leaving only the diehards.) By 2004, essentially all of the leaders in web design advocated using semantic markup and stylesheets rather than grotty presentational markup.

Once standards-supporting browsers were available, groups like the WaSP, or Web Standards Project (founded in 1998) pressured readers to upgrade their browsers, so that web sites could use semantic markup rather than presentational markup without looking bad.

Microformats

Thus was born the microformat: a way to add semantics to HTML documents without breaking their validation.

At ETCon 2004, Tantek Çelik and Kevin Marks gave a talk entitled "real world semantics" --- before they invented the "microformats" term (Kragen, this is incorrect, note that the presentation you linked to *uses* the term microformats in it -- Tantek), but the presentation covers XOXO, XFN, GeoURL, blogchalking, CC rel="license", VoteLinks. Subsequently they held a BoF session which apparently had more than the three participants listed. The next week, they presented a lightning-talk version of the same talk at ConCon 2004. In 2005, Tantek gave an SXSW talk entitled "The elements of meaningful XHTML".