From Microformats Wiki
Revision as of 22:16, 20 December 2008 by Brian (talk | contribs) (Reverted edits by ElchiDarli (Talk) to last version by Brian)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

The Need for Digital Media Microformats

Author: Manu Sporny
License: 88x31.png (released under a Creative Commons Attribution license)

The web is an amalgamation of multimedia. Text, audio, video and images dominate most of what we interact with on a daily basis. This has been the case for almost sixteen years now (2007)[1]. Similarly, the concept of a Semantic Web has been around for almost 8 years[2]. Computers were supposed to be smarter than they are now... so, what's the hold up?

Most of what we see on the web is intended for everyday folks like you and me. In fact, we depend on people to extract meaning out of the Internet. Most of this meaning is conveyed using text on web pages.

Text markup has been the most innovated presentation format on the web. We have very rich markup for documents - bold, underline, paragraph delimiters, bulleted lists, and tables. A computer knows what text is bold, exactly where each paragraph starts and ends, and what text is italicized. If you asked a computer to show you all of the bulleted items on a web page, or all of the italicized text on a web page, it would have no problem doing so.

However, if you were to ask the same for audio or video information contained in the page, your browser would not be able to perform those tasks. To demonstrate some of these issues, here's a very short list of easy and difficult tasks for computers browsing the web:

  • EASY: Show me all of the bulleted items on a page.
  • DIFFICULT: Show me all of the songs on a page.
  • EASY: Show me all of the links on a page.
  • DIFFICULT: Show me all of the links that lead to audio samples on a page
  • EASY: Show me all of the videos on a page.
  • DIFFICULT: Show me all of the videos on the page that are in the public domain.

So, why is text so easy for computers? Why do they struggle with audio, video, images and non-text related concepts?

Text, Audio and Video Semantics

Simply put, there is a richer markup mechanism for text than there is for audio, video and images. The Hyper Text Markup Language (HTML) can describe items that are bold, underlined, or bulleted. No such standardized mechanism exists for auditory or visual multimedia. Usually, people use text to describe audio, video or images. Unfortunately, that descriptive text does not mean anything to a computer.

This is where Microformats come in, as they provide a simple mechanism for marking up descriptive text about multimedia. The markup is simple. Here is a piece of text before audio markup has been performed:

I didn't really like With or Without You by U2 when the song first came out, but it grew on me after a while.

While a person reading the text would have a fairly easy time picking out the name of the band and the song, it would be a nearly impossible task for a standard computer. With Microformats, things change fairly quickly.

Here is the same piece of text after audio markup has been performed using the hAudio Microformat:

I didn't really like <div class="haudio"><span class="audio-title">With or Without You</a> by 
<div class="contributor"><span class="vcard"><span class="fn">U2</span></span></div></div> when 
the song first came out, but it grew on me after a while.

While the HTML is slightly more complicated, something very important has happened by using the markup as described above. When you ask a computer to identify the song in the sentence, it will immediately respond by stating "With or Without You". Even better, if you asked it what artist did the song, it would be able to reply "U2".

So why is this so important? You can already pick out the song, why does your computer need to know what is on a web page?

Building a Smarter Web

One of the goals that we have for the Internet is to embody all human knowledge in an easily accessible, searchable, meaningful network of information[3]. For this to happen, we must start helping computers identify meaningful concepts in the sea of information that is the Internet. While a noble goal, it is very general. Let us look at specifics. Let us narrow our focus to culture, namely music and film, and examine how something as simple as audio, video, and image Microformats transform the World Wide Web.


The Firefox, Songbird, and the Miro Media Player development teams have all expressed an interest in audio and video Microformats. They have shown this interest because they understand how much Microformats can help the browsing experience.

To illustrate how the browsing experience can be improved, let us focus on the Songbird media player. The player has a great feature that can identify all of the MP3 files on a web page and play them while you are browsing the page. This is useful when you are on a page, such as Scissorkick and want to listen to the music that is being reviewed without bringing up a separate audio player.

Songbird will even download the first part of every MP3 file and display the embedded artist and track title in the player. While this is fine for a small number of MP3s on a web page, downloading the first several seconds of a large number of MP3s can have a severe impact on the website. It also takes quite a while to download track information for more than 10 MP3s and is, in general, a waste of bandwidth since the information is already on the web page.

With the hAudio Microformat, all of this becomes a much more pleasant experience. Downloading the first few seconds of each MP3 file is unnecessary since the artist and track title information is already marked up on the website in hAudio. The hAudio specification can even specify sample, download and purchase links - no guess work is needed by the browser.

In addition to enhancing the browsing, sampling and purchasing process - the browser can detect how many songs are on a page and if they are interrelated to one another. For example, if all of the songs on a page have the same artist, it is a safe bet that the web page that is being viewed is about a particular artist. The same can be said for genre.

By marking up the audio and video metadata that already exists on a web page, we can make our browsing experience easier. Here are some other ways that marking up audio information on a web page can be helpful:

  • Creating lists of artists that you are interested in (save to Artist/Album favorites)
  • Automatically queuing video files for playback by sending all video Microformatted content to your favorite multimedia player. All without needing to click on any of the links on the web page.
  • Storing a browse history of audio and video in addition to web pages and sites.
  • Integrating purchasing information into digital e-commerce sites.
  • Helping to auto-blog about music and video that you find on the Internet. For example, select music Microformatted content and click "Blog about With or Without You by U2...".


There are a number of sites on the Internet that attempt to index the music and film industry - The Internet Movie Database (IMDB), MusicBrainz, and Bitmunk and many others list meta-information on their sites. Information such as titles, artists, contributors, publishers, labels, release dates, genres, and other useful information that is currently not semantically marked up. Add to these the many web sites that allow user-generated content, such as YouTube, MetaCafe, Vimeo and others.

Now, ask a search engine to show you all of the music and films that has Keanu Reeves has taken part in - chances are that you wouldn't be able to get good results out of even a top-notch search engine. Even hitting the IMDB page will result in Keanu's entire career with Dogstar or Becky (bands that he has been in) not being mentioned.

If one were to create a specialized search engine that cataloged all audio and video Microformatted data, the previous search would return some very rich information. You would plug the artist name in as Keanu Reeves and get a plethora of hits back related to film and music.

Simply put, both of the audio and video Microformats will enable much smarter searches. All of the data is already out there, and it would take very little effort for Microformats to be implemented on just a few of the key data sites (IMDB, Bitmunk, etc.) to enable much richer searches.

Parting Thoughts

There are many knock-on effects that the audio and video Microformats initiatives will have on the Internet. We have to crawl before we walk. We will have to implement audio and video Microformats as widely as possible before we can fully understand all of the innovative new uses they enable. The purpose of this article is to help more people understand how much of an impact digital media Microformats will have on the web.

At the moment, we can see that they will aid music and video search engines and make our media browsing experience on the web far better. If you can see something else that audio and video Microformats enable, please mention them below in the Feedback section.


[1] Berners-Lee, Tim. The WorldWideWeb Browser,
[2] Berners-Lee, Tim; Fischetti, Mark (1999). Weaving the Web. HarperSanFrancisco, chapter 12. ISBN 9780062515872.
[3] World Wide Web Consortium. Semantic Web Activity Statement,


Please leave feedback, thoughts and comments in this section. Please sign your comments - the comments will be used to revise this document.

  • DIV is a block-level element... Andy Mabbett 00:50, 1 Aug 2007 (PDT)

This is a wiki, why not just let everyone edit this page? you can add yourself as editor and others as contributers? If you cannot let this be the same as every other wiki page, then maybe this article is better served as a personal blog post where you can moderate the comments? Brian August 2nd 2007