New modern microformats2 parsers continue to be developed in various languages, and this past year, four new parsing libraries (in three different languages) were added, almost doubling our previous set of six (in five different languages) that brought our year 11 total to 10 microformats2 parsing libraries available in 8 different programming languages.
microformats2 parsing spec updates
The microformats2 parsing specification has made significant progress in the past year, all of it incremental iteration based on real world publishing and parsing experience, each improvement discussed openly, and tested with real world implementations. The microformats2 parsing spec is the core of what has enabled even simpler publishing and processing of microformats.
The specification has reached a level of stability and interoperability where fewer issues are being filed, and those that are being filed are in general more and more minor, although once in a while we find some more interesting opportunities for improvement.
We reached a milestone two weeks ago of resolving all outstanding microformats2 parsing issues thanks to Will Norris leading the charge with a developer spec hacking session at the recent IndieWeb Summit where he gathered parser implementers and myself (as editor) and walked us through issue by issue discussions and consensus resolutions. Some of those still require minor edits to the specification, which we expect to complete in the next few days.
The number of microformats2 parsers in different languages continues to grow, most of them with deployed live-input textareas so you can try them on the web without touching a line of parsing code or a command line! All of these are open source (repos linked from their sections), unless otherwise noted. These are the new ones:
The Java parsers are a particularly interesting development as one is part of the upgrade to Apache Any23 to support microformats2 (thanks to Lewis John McGibbney). Any23 is a library used for analysis of various web crawl samples to measure representative use of various forms of semantic markup.
The Elixir, Haskell, and Java parsers add to our existing in-development parser libraries in Go and Ruby. The Go parser in particular has recently seen a resurgence in interest and improvement thanks to Will Norris.
These in-development parsers add to existing production parsers, that is, those being used live on websites to parse and consume microformats for various purposes:
As with any open source projects, tests, feedback, and contributions are very much welcome! Try building the production parsers into your projects and sites and see how they work for you.
Still simpler, easier, and smaller after all these years
Usually technologies (especially standards) get increasingly complex and more difficult to use over time. With microformats we have been able to maintain (and in some cases improve) their simplicity and ease of use, and continue to this day to get testimonials saying as much, especially in comparison to other efforts:
This last testimonial really gets at the heart of one of the deliberate improvements we have made to iterating on microformats vocabularies in particular.
We have had an implementation-driven and implementation-tested practice for the microformats2 parsing specification for quite some time.
More and more we are adopting a similar approach to growing and evolving microformats vocabularies like h-entry.
We have learned to start vocabularies as minimal as possible, rather than start with everything you might want to do. That “start with everything you might want” is a common theory-first approach taken by a-priori vocabularies or entire “predefined ontologies” like schema.org’s 150+ objects at launch, very few of which (single digits?) Google or anyone bothers to do anything with, a classic example of premature overdesign, of YAGNI).
With h-entry in particular, we started with an implementation filtered subset of hAtom, and since then have started documenting new properties through a few deliberate phases (which helps communicate to implementers which are more experimental or more stable)
Proposed Additions – when someone proposes a property, gets some sort of consensus among their community peers, and perhaps one more person to implementing it in the wild beyond themselves (e.g. as the IndieWebCamp community does), it’s worth capturing it as a proposed property to communicate that this work is happening between multiple people, and that feedback, experimentation, and iteration is desired.
Draft Properties – when implementations begin to consume proposed properties and doing something explicit with them, then a postive reinforcement feedback loop has started and it makes sense to indicate that such a phase change has occured by moving those properties to “draft”. There is growing activity around those properties, and thus this should be considered a last call of sorts for any non-trivial changes, which get harder to make with each new implementation.
Core Properties – these properties have gained so much publishing and consuming support that they are for all intents and purposes stable. Another phase change has occured: it would be much harder to change them (too many implementations to coordinate) than keep them the same, and thus their stability has been determined by real world market adoption.
The three levels here, proposed, draft, and core, are merely “working” names, that is, if you have a better idea what to call these three phases by all means propose it.
In h-entry in particular, it’s likely that some of the draft properties are now mature (implemented) enough to move them to core, and some of the proposed properties have gained enough support to move to draft. The key to making this happen is finding and citing documentation of such implementation and support. Anyone can speak up in the IRC channel etc. and point out such properties that they think are ready for advancement.
How we improve moving forward
We have made a lot of progress and have much better processes than we did even a year ago, however I think there’s still room for improvement in how we evolve both microformats technical specifications like the microformats2 parsing spec, and in how we create and improve vocabularies.
It’s pretty clear that to enable innovation we have to ways of encouraging constructive experimentation, and yet we also need a way of indicating what is stable vs in-progress. For both of those we have found that real world implementations provide both a good focusing mechanism and a good way to test experiments.
In the coming year I expect we will find even better ways to explain these methods, in the hopes that others can use them in their efforts, whether related to microformats or in completely different standards efforts. For now, let’s appreciate the progress we’ve made in the past year from publishing sites, to parsing implementations, from process improvements, to continuously improving living specifications. Here’s to year 12.
Nine years ago we launched microformats.org with a basic premise: that it is possible to express meaning on the web in HTML in a simple way—far simpler than the complex alternatives (XML) being promoted by mature companies and standards organizations alike.
Today microformats.org continues to be a gathering place for those seeking simpler ways to express meaning in web pages, most recently the growing IndieWeb movement.
Looking back nine years ago, none of the other alternatives promoted in the 2000s (even by big companies like Google and Yahoo) survive to this day in any meaningful way:
From this experience, we conclude that what large companies support (or claim to prefer) is often a trailing indicator (at best).
Large companies tend to promote more complex solutions, perhaps because they can afford the staff, time, and other resources to develop and support complex solutions. Such approaches fundamentally lack empathy for independent developers and designers, who don’t have time to keep up with all the complexity.
If there’s one value that’s at the heart of microformats’ focus and continued evolution of simplicity, it is that empathy for independent developers and designers, for small consulting shops, for curious hobbyists who are most enabled and empowered by the simplest possible solutions to problems.
We now know that no amount of large company marketing and evangelism can make up for a focus on ever simpler solutions which take less time to learn, use, and reliably maintain. As long as we focus on that, we will create better solutions.
Speaking of taking less time, we’ve learned some community lessons about that too. Perhaps the most important is that as a community we are far more efficiently productive using justIRC and the wiki, than any amount of use of email. In fact, the microformats drafts that were developed wtih the most email (e.g. hAudio) turned out to be the hardest to follow and discuss (too many long emails), and sadly ended up lacking the simplicity that real world publishers wanted (e.g. last.fm).
Email tends to bias design and discussions towards those who have more time to read and write long emails, and (apparently) enjoy that for its own sake, than those who want to quickly research & brainstorm, and get to actually creating, building, and deploying things with microformats.
Thus we’re making these changes effective today:
IRC for all microformats discussions, whether research, questions, or brainstorming
email only for occasional announcements and to direct people to IRC.
wiki for capturing questions, brainstorming, conclusions, and different points of view
We’re going to update the site to direct all discussion (e.g links) to the IRC channel accordingly.
Over the past few years microformats2 has proven itself in practice, with numerous sites both publishing and consuming, several open source parsing libraries, and a growing test suite. All the lessons learned from the evolution from original microformats, from RDFa, and from microdata have been incorporated into microformats2 which is now the simplest to both publish and parse.
It’s time to throw the switch and upgrade everything to microformats2. This means three things:
First, we’re starting by upgrading the links on the microformats.org home page to point to the microformats2 drafts, which are ready for use.
We’ll be incrementally upgrading the markup of the microformats.org site itself to use microformats2 markup.
Second, if you publish any kind of semantic information, start upgrading your web pages to microformats2 across the board.
If you’re concerned about what search engines claim to support, there are two approaches to choose from:
Know that search engines are a trailing indicator, and as microformats2 usage grows, they’ll index it as well.
Or: Use one classic microformat (supported by all major search engines) at top of your page, e.g. on the <body>, in addition to your microformats2 markup throughout your pages. Search engines only really care to summarize the primary topic or purpose of a web page in their “rich snippets” or “cards”, and thus that’s sufficient.
Check out the latest validators which now include some microformats2 support as well!
Third, this is a call to upgrade all microformats supporting tools to microformats2. As nearly all of these are open source, this is an open call for contributions, updates, patches, etc. for:
Classic microformats have been serving the web community’s need to extend HTML’s expressive power since 2004. Through an evolutionary, open, rigorous community process and human-first design principles, structured use of the class and rel attributes have paved the cowpaths of publishing data about people, places, events, reviews, products and more.
Microformats2 is the next big effort by the community to improve how microformats are authored, parsed and defined. Version two has multiple working open source implementations which independents are using in production and is easier to publish and consume than ever.
In this series of guides I’ll show you how to be the next site publishing and consuming microformats.
You can see how a microformats parser sees your markup by pasting any of the code samples below into this php-mf2 sandbox. Go ahead and experiment with adding more properties and see what happens!
In order to demonstrate some of the differences between microformats 2 and Classic Microformats/other competing technologies, I’ll use the process of content-out markup — going from plain text to HTML and finally adding a sprinkling of microformats. Let’s start with my favourite example: mentioning a person.
<span class="vcard"><a class="fn n url" href="http://waterpigs.co.uk">Barnaby Walters</a></span>
That’s 37 extra characters and a whole extra nested element just to say “This link is to a person”, not to mention the strangely named root classname (vcard? I thought this was an hcard?) and multiple cryptic fn n classnames. Competing technologies are typically even longer and messier.
With microformats 2 this all becomes much simpler:
Weighing in at only 15 characters, this is quicker to type, easier on the eyes and easier to remember.
There are two fundamental changes in microformats 2 which make this helium-esque lightness possible: Implied Properties and prefixed classnames.
When you give class=h-card to an element, you’re saying “This element represents a person”. In many cases the element will be simple; just a name, perhaps with a link or photo. Why should you add extra elements and classnames just to tell a dumb computer which bit is the name, URL or photo URL when that information is already expressed by the markup?
Implied properties save you from this tedium. When you specify an element as an h-card without explicitly defining which parts are the name, url or photo url, the parser will figure out what you meant. And it’s not just for h-cards either — thanks to the new generic parsing in microformats 2, this shorthand works for any microformat.
Classic microformats used plain classnames which looked like any other (e.g. vcard, n or note). There were a few problems with this — classnames would clash, cause false positives or be thrown away by developers who weren’t microformats-aware (“these classnames aren’t doing anything!”).
This also meant parsers were tricky to write, as each one had to maintain a long list of classnames used by each microformats, resulting in many parsers quickly going out of date.
Prefixing classnames solves both of these problems: semantic microformats2 classnames are set apart from styling hooks, and parsers can figure out which classnames to look for, cutting down on maintenance. There are 5 prefixes:
h-* root classnames specify that an element is a microformat, e.g. <span class="h-card">
p-* specifies an element as a plain-text property, e.g. <span class="p-name">My Name</span>
u-* parses an element as a URL, e.g. <a class="u-url" href="/"></a>
dt-* parses an element as a date/time, e.g. <time class="dt-published" datetime="2013-05-02 12:00:00" />
e-* parses an element’s whole inner HTML, e.g. <div class="e-content">
I’ll demonstrate the use of all of these prefixes with some real-world examples.
Firstly, another h-card, more fleshed out than the earlier example. This might be the sort that you put on your homepage:
p-name, u-url and u-photo are fairly standard properties you’ll see over and over again. Another improvement in microformats 2 is increasing consistency between different microformat specifications — again, making them easier to authors to remember and consumers to understand. A nice side effect is that a single element can be more than one type of microformat at once — for example a h-entry and h-review.
To demonstrate dt-* and e-*, here’s a note (like a tweet or short blog post), marked up using h-entry:
<div class="e-content p-summary p-name">
<p>Just writing a guide to using <a href="http://microformats.org/wiki/microformats-2">
<time class="dt-published" datetime="2013-05-01 12:00:00">20 minutes ago</time>
Here, I want the HTML inside the content to be passed to the parser, so I mark it up as e-*. Notice I’m also specifying that content as the summary and name — one element can be parsed as multiple properties.
I’m using the time element to mark up the time this note was published. Because I’ve prefixed the classname with dt- and it’s on a time element, parsers know to look in the datetime attribute if it exists.
The third area in which microformats2 improves on the previous version is in combining microformats and making each microformat specification more reusable — for example, both a person or an event might have an address, so it makes sense to reuse the same markup for both.
There are many reasons to combine microformats — say you want to specify the author of a blog post or review. You would do so by making the p-author of the post an h-card:
Last week the microformats.org community celebrated its 7th birthday at a gathering hosted by Mozilla in San Francisco and recognized accomplishments, challenges, and opportunities.
Humans First: Admin Emeriti & New Admins
The microformats tagline “humans first, machines second” forms the basis of many of our principles, and in that regard, we’d like to recognize a few people and thank them for their years of volunteer service as community admins:
Both have similarly been consistent positive contributors to microformats for years, and we’re very happy they’ve stepped up as community admins.
Challenges & Opportunities
During microformats.org’s first seven years, we’ve seen many other attempts to promote structured data on the web. Some have since disappeared or been retired, one never got past an initial blog post, a couple have gained traction, and couple were just launched in the past two weeks:
2005-2011: Google Base schema
2007-2011(?): Google Data API/Elements
2009-2009(?): Yahoo et al CommonTag.org
2010-2012+ Facebook OGP meta tags
2011-2012+ Google/MS/Y! Schema.org
2012-2012+ Twitter Cards meta tags
Each of these is a signal that there are a few people (or a company) who want to do something with structured data on the web better than what they thought was already out there.
We can view each of these as a set of challenges and questions:
What problems were these created to solve?
Are they solving similar, overlapping, or different problems than microformats?
Did their creators know about microformats?
Did they try using microformats?
Are they open efforts (or at least trying to be open), or are they vendor-specific?
We can also view each of these as feedback, whether intended or not, and opportunities to improve microformats. Open source teaches us that great ideas come from everywhere, thus we should analyze and document other efforts, seeking to answer the above questions.
If these alternative efforts are solving the same or similar problems as microformats, how can we improve microformats to meet their needs with established open standards?
If they’re addressing different problems, are those problems that microformats should be expanded to solve?
And if they’re open efforts, how can we best collaborate and produce even better solutions together?
microformats ~70% structured data domains
Even after seven years and the emergence of various alternatives, according to the Web Data Commons as of 2012, microformats still have the greatest adoption across different websites for publishing structured data in HTML:
Our continuing success is no indication that we should rest.
We should document the alternatives as they emerge, do our best to answer the questions posed, and reach out to other communities to find areas of overlap to collaborate. With greater collaboration comes greater interoperability.
As we continue to evolve and expand microformats, we should keep in mind the principles and values that brought us here. Here are a few of the core values that still distinguish microformats as a technology, effort, and community:
Humans first (machines second). microformats are designed primarily for humans, and primarily for the greatest number of humans, whether authoring, or representing (e.g. the microformats community spearheaded and got the most inclusive “gender” property ever designed into the vCard 4 (RFC 6350) – which hCard 1.1 and 2 uses).
Mozillafront end developer / UX engineerGordon Brander recently remarked to me, “microformats work within existing web designer/developer workflow, which makes them easy and convenient. Other solutions require learning new attributes, which is enough of a barrier to just not bother.”
When I related this to Sam Weinig of the Safari team, he made an astute observation and comparison: “essentially what you’re saying is that bolt-on solutions, even just attributes, whether for accessibility like ‘longdesc’, or for semantics like RDFa/microdata, just don’t work as well.”
Very openly licensed standards. By virtue of being Creative Commons licensed from the start, and public domain / CC0 compatible since late 2007, microformats are the most openly available standards developed world-wide. CC0 provides the maximum freedom to re-use, publish, and if you’ve got a better idea, fork, experiment, and submit suggestions. Just like open source.
This openness has shown to be particularly effective on the microformats wiki, which has 17 different translations in progress.
These independent translators from around the world, who by virtue of committing their work to the public domain / CC0, perhaps know that not only are they sharing their work, but that the work they do cannot be taken from them. Once placed into the public domain, their work remains there, always reusable at any point in the future, by anyone.
Contrast this with any form of writing/creating that is owned. If it’s owned it can be bought, and thus taken away from you. While that’s a perfectly reasonable trade to make for an income, for standards and longevity, it’s important that our work remain maximally public, as an open resource for generations to come.
And finally, to date, no other standards organization has chosen to put all their research, examples, specifications etc. in the public domain / CC0. We invite every open standards organization to do so.
Open spec history and editing.
Two key distinguishing factors that contribute to our community openness are aspects of microformats specs:
specs with open revision history – every microformats specification has an open revision history dating back to the launch of microformats.org seven years ago.
An open revision history provides a level of transparency, accountability, and provenance second to none. In comparison, the other efforts listed above lack any kind of open revision history, e.g. clearly showing date-time, who edited, and how much changed in each spec. The microformats wiki revision histories are also easily browsable and delta-changes-viewable, far more usable/accessible than web views on revision control systems (e.g. W3C’s cvs and hg repositories).
specs with edit buttons – also unprecedented and unmatched, every microformats specification is on a wiki page, editable by any account holder, should they find typos or other obvious errata / minor edits (spec editors handle larger spec edits, though anyone may copy a spec and demonstrate major edits for consideration in a copy).
The community value of allowing such open editing cannot be understated. Many longtime contributors started interacting with the microformats community by jumping in and making minor edits or suggestions on the wiki (some, like Brian Suda, first participated by contributing edits to the original hCard specification, though we quickly followed up on IRC).
Latest in microformats
The past year has seen several interesting and useful developments in and related to microformats.
HTML5 enhanced time and data elements
In November 2011 there was a heated discussion in the W3C HTML WG and WHATWG about the HTML5 <time> element. It was first dropped, and then thanks to a lot of research contributions (from many in the microformats community) and persistence, not only was <time> saved but also enhanced in numerous ways useful for microformats. As such, every current use of anything date or time related (including durations) in microformats should use the HTML5 <time> element. For hCard and hCalendar, if you depend on H2VX for providing “add to address book” and “add to calendar” features, use “dev.h2vx.com” URLs which have support for HTML5 semantic elements, and follow @h2vx on Twitter for updates.
In addition, HTML5 now also has a new <data> element, for other types of data where the human representation is not easily/unambiguously machine-readable, or just uses a different format, e.g. geo latitude/longitude coordinates. The HTML5 <data> element is a nice upgrade from the Value Class Pattern ‘value-title’ feature we developed in microformats a few years ago.
Media Temple Server Hosting
For the first six years, the microformats.org server was sponsored by Commercenet, for which we’re very thankful. And for the past year Rohit Khare graciously supported our server hosting.
However from an infrastructure viewpoint, the server has been on the same virtual hosting, and in some cases original software, since launch in 2005.
We’re very happy to announce that Media Temple is providing new state of the art hosting for microformats.org. After working several hours over a few weekends, we transitioned everything* to the new server and flipped the DNS switches without any downtime. The new server is noticeably faster and more responsive, which has made wiki-editing even more seamless.
Microformats 2 – Start Publishing And Parsing
microformats 2 has been in development for the past couple of years, incorporating lessons learned from many years of using microformats in the real world as well as experience with various other semantic HTML data markup technologies (e.g. microdata, RDFa). It’s stabilized and ready for trying out in real world experiments, both in pages on the web, and the parser implementations.
There are already several examples in the wild publishing microformats 2 versions of hCard and hCalendar, including this very blog post. At least two parsers are in development as well, one in Ruby, and one in JS.
The time has come: if you write a tool that publishes or parses microformats, update it to support:
Test your parsers with both the examples in the spec and the ever growing list of microformats 2 supporting sites.
Stay involved with IRC, wiki, and Twitter
While our blog and mailing lists were used a lot in early days, IRC and Twitter seem to have become the first place many ask questions about microformats, and thus we’ve adapted and are responding accordingly.
Today, Google launched a new search feature: Recipe View!
The new search category enables users to discover recipes that have been marked-up following the hRecipe specification from a variety of sources with a new level of accuracy. Google have made it easy for users to find recipes, because authors are now making it easy for them to locate their data from within their web pages.
“Our intent is to make better user expereince to see if we can jumpstart this ecosystem,” Menzel said. “That way when someone thinks ‘Hey, I just invented a great recipe, let me put it on my blog,’ and that person’s recipe should be a candidate.”
But Menzel insists it’s got to be easy and that Google doesn’t want to push busy webmasters to do any work that won’t result in more traffic.
“This is really a pragmatic response to the dream of the semantic web,” Menzel said. “We would love if the XML world existed — it would make search awesome, but no one is going to to do it. But we need to start somewhere, and a lot of the internet is built manually by people and their time is valuable.”