[microformats-discuss] microformats vs. plain XML formats
Tantek Ç elik
tantek at cs.stanford.edu
Wed Jul 13 07:48:04 PDT 2005
On 7/13/05 6:48 AM, "Joshua Porter" <porter at bokardo.com> wrote:
> On Jul 13, 2005, at 8:08 AM, Tantek Çelik wrote:
>> On 7/12/05 2:12 PM, "Joshua Porter" <porter at bokardo.com> wrote:
>>> Is it fair to say that microformats are no further along in auto-
>>> discovery than are standalone XML formats?
>> I'm not sure that "auto-discovery" as you use it means the same
>> thing as
>> others use it to mean. It is a bit of an overloaded term, and as
>> such, a
>> specific use-case would help with understanding what you mean.
> I'm talking about auto-discovery in the sense that Safari and Firefox
> autodiscover RSS feeds and provide additional functionality to them.
> (Danny and Ryan answered this question by saying that browsers don't
> *currently* autodiscover them in this way.)
Ah. Then the question doesn't make sense as defined, since microformats are
used *in* the content, not somewhere else where a rel="alternate" link
>>> I ask because I'm confused about the "support" of microformats. I
>>> understand that microformats are "supported" by browsers in the sense
>>> that browsers read them as XHTML.
>> Yes, microformats are built from XHTML building blocks, and thus
>> have all
>> the support that those have, which is a level of support far above
>> In addition, microformats can be added to web pages and still have
>> those web
>> pages *validate*, something which is very important to modern web
> Not important to users, however.
Only indirectly important to users, as the likelihood of the page working in
more browsers, more devices etc. goes up.
For the sake of discussion, we take XHTML validity as a requirement. This
has already been well discussed and understood by modern web designers and
developers. If you want to explore that, there are other forums where
people would be more than happy to spend time explaining all the why's etc.
>> "plain XML" on the other hand, cannot be added to an XHTML web page
>> and have
>> it still validate, short of doing nasty things like CDATA/escaping the
>> markup, or putting it into comments. Both of which have already been
>> covered for all the problems they have.
> My RSS feed is full of CDATA escaping, and it works well. Assuming
> this is bad, though, could you provide a pointer to the coverage?
Check xml.com articles. For more discussion on this, again, there are other
lists that have done a better job on it. A little research on "escaped XML"
or "escaped markup" should be sufficient.
>>> But that's not really doing
>>> anything with the semantics we've added to our markup. (we might as
>>> well be writing in any arbitrary format).
>> Not at all. XHTML has numerous predefined semantics. "plain XML"
>> has none
>> (except perhaps the xml:lang attribute).
> By this I mean that if UAs aren't doing anything with the semantic
> markup, it doesn't matter that it is semantic.
Actually, it does. Regardless of whether there is any default behavior by
the UA, the semantics are there for any number of other "UA-like" extensions
to take advantage of it.
The benefits of semantic XHTML are *also* well known in the web design and
web authoring community. Another point I see no reason to argue.
>>> Taken further, it seems to me that for anything to be done with
>>> microformats, our UAs will need to be updated in some fashion.
>> No. *could be* updated, yes.
>> *need to be* updated? No. For example, tons have been done with
>> microformats with built-in CSS support, utilizing microformats with
>> favelets/bookmarklets, Grease Monkey plugins etc. None of which
>> updating the browser.
> I would consider bookmarklets and Grease Monkey scripts as updating
> the UA. (or at least the functionality of the UA).
That's a bit of a cop out.
No one I know would consider adding a few bookmarks to be "updating" their
My point was, we don't need to wait for any kind of "update of the UA" as
typical people think of UA updates (i.e. download a new browser).
> This is a good
> thing...that it is easy to add functionality relatively quickly.
>> Similarly, you can subscribe to hCalendar in existing calendaring
>> applications like Apple iCal and Mozilla Sunbird, without *any*
>> updates to
>> those applications.
>> This is one of the advantages of basing a microformat standard on well
>> adopted other standards. Instant interoperability.
>>> course, Technorati is supporting some of them already....but we don't
>>> want something that will be supported by only one vendor
>> Hence Technorati has from day 1 developed microformats as open
>> standards on
>> an open wiki, open for anyone to view (and edit/contribute to).
>> Note that this is in *stark* contrast to *numerous* other "standards"
>> efforts which are typically developed either behind closed doors of a
>> paid-membership only committee, or developed on a closed mailing
>> list, or a
>> closed wiki, or perhaps just in the closed-mind of a singular blow-
> I find a lot of resistance to Mr. Winer.
It is amazing how certain individuals can generate so much self-resistance.
I'm sure there are lessons to be learned there.
But no matter, we try not to be distracted by the blow-hards.
> As I'm new to this stuff,
> I'm not taking sides. I want the best technology,
Precisely why we defined microformats in terms of a set of technology
> not an anti-person format.
No one I know here wants that either. IMHO, it is a waste of time to pursue
>>> presumably Technorati would support another XML format, too.
>> Not necessarily. A big concern on the part of Technorati (and any
>> search implementer) is data quality, signal to noise. As such if
>> XML (like
>> most) formats encourage invisible metadata, they will be of
>> lower utility to Technorati than formats that simply markup already
> (presumably Technorati would consider supporting another widely-
> adopted XML format)
Technorati already supports parsing of RSS and Atom to quite a level of
detail for example. So the short answer is yes. But I have to admit that
is a bit off topic for this list as well. You're welcome to continue that
discussion by sending email to feedback at technorati.
>>> Thus, microformats, to me, sound like just as much work as would be a
>>> separate format (like RSS).
>> What application / site are you looking at adding microformats too?
> I've got several. My personal blog (bokardo.com). Various commercial
Good. A personal blog is a good place to start with microformat publishing
>>> Indeed, many of the microformats are
>>> based on living formats written in plaintext or xml. So some are
>>> already developed...like iCal,
>> Non-trivial to author.
> It is after one person writes a Wordpress plugin, which takes about a
> day. Remember, only a tiny fraction of RSS feeds are written *by
There are now enough CMS's and blogging systems that no one custom plugin
will make that big a dent.
Thus the cost of publishing must be lowered.
And I would bet that it would take more than a day for someone to write a
valid iCal Wordpress plugin that outputted valid iCalendar that various
clients could consume, that actually provided a UI for useful functionality
Just look at how long other iCal efforts have taken.
>> Tried and failed already. Effectively zero uptake on the Web.
> I think you'll find that after the success of RSS that there will be
> a lot more interest on these types of things. Most the standards work
> was ahead of the curve, in my opinion, and didn't have the attention
> of the average developer (or Wordpress/MT junkie) to write plugins
> for it.
Perhaps. OTOH, I predict the various "do it yourself" XML vocabularies will
deteriorate rapidly to a tower of babel problem, with everyone making up
And remember, RSS took almost 10 years to become as dominant as it has.
> For example, I co-wrote an article on Digital Web this spring
> ( http://www.digital-web.com/articles/
> web_2_for_designers/) ...effectively years behind the initial
> thinking on these subjects, and it gets a lot of attention, not
> because its a *new* idea, but because it is *new to them*.
>Writing Semantic Markup: Transition to XML
With all due respect, that section fails to make that case.
Generic XML effectively died on the Web under the weight of all the specs
required to make it "work" half as well as HTML, and the under the friction
of incompatibility with the way people already worked, and under the
syntactic vinegar of awkward (and mostly unnecessary) constructs like
namespaces. Dozens of XML formats have been tried for *years* on the *web*
have failed. Only two have gained any adoption on the *web*: RSS and XHTML.
Also, your article talks about the limits of semantics in XHTML.
Precisely why we are doing microformats!
To extend those semantics in an interoperable, evolutionary way.
Rather than throwing out everything that works and asking people to learn a
new set of tools.
> RSS is one of the key building blocks
Hah! RSS is nothing more than a degenerate envelope format (AKA feed
It's certainly not a building block for other formats. It's a top level
container format, and that's about it.
Seriously are there any other building blocks along that vision?
It took 10 years for the so-called building block of RSS to be practical.
Contrast that with microformats where within months we've got numerous
*actual* building blocks you can use *today*, you can mix and match *today*,
you can *publish* *today*.
Finally, with all due respect, there are *a lot* of fluff/hype pieces coming
out about the so-called "Web 2.0" (NOTE: Your article was *MUCH* better
written than the vast majority of such articles, I'm not lumping your
article in with the rest, let me be clear about that). It's an excellent
way to get headlines and attention, but so far, the visions offered have
been such a mishmash that people get more confused than productive.
Your bits about mixing and matching RESTian web services are certainly right
on, and in many ways, were an evolutionary reaction to the prescribed
methods of WSDL, SOAP etc.
Microformats are a very similar evolutionary reaction to the prescribed
methods of generic XML or RDF.
Phil Windley provided an excellent summary of this:
> I would
> assume that you've been ahead of the curve for many years. *Most*
> people are not, and are still authoring in ways that would make you
At a recent presentation to web designers in SXSW, in a room filled with
100s of people, I asked who was still using <table> tags for layout in their
work. Only 2 people raised their hands.
I think you might be surprised where today's web design community is.
>>> and OPML.
>> <sigh />
>> A totally unnecessary reinvention of <ol><li><a href>.
>> Those whose ignore standards are doomed to reinvent them.
> Do you have the same opinion of RSS?
I believe Danny Ayers already provided a good statement about that.
>>> Therefore, I wonder if there are some applications for which a
>>> separate format are better suited,
>> Perhaps non-web applications?
>>> and presumably some applications
>>> for which microformats are better suited.
>> Anything that involves publishing content on the Web.
> I'm not sure I understand the "everything through XHTML" point of
Are you a web designer?
> Why not embrace the paradigm similar to RSS (and possibly
> Google sitemaps) to have semantic formats at certain conventional
That fails the embeddability principle (for all practical purposes) we put
forth as a requirement.
> One possibility is that the index page of a site is going
> to turn into a *true* index page, with a bunch of <link> tags
> pointing to the formats available for discovery. Is this not desirable?
It's more work for less benefit.
New languages = cost.
New file formats = cost.
Lots of separate files = maintenance, more cost.
Lots of separate files = you lose context. You lose embeddability.
Lots of separate files != building blocks.
Lots of separate files == data silos.
> In the same way that we know where to find the root XHTML page, we
> (our UAs) know where to find the root RSS feed. The new Google
> sitemaps format, for example, is the same way.
And completely unsuitable for mixing and matching with anything else as a
It's a data silo.
>>> However, I'm hearing this push about microformats...which is all well
>>> and good...but I don't know, as a developer, where I should *spend my
>> What do you develop Josh? That will probably impact where you
>> spend your
> I was talking, of course, about whether I should spend time updating
> my sites to microformats, or should I say, provide a .ics file
> instead of an hCalendar file (or should I provide both?).
The point is that if you are already publishing event data as part of your
website, it is *easier* and *cheaper* to simply mark it up with hCalendar,
than to bother with .ics files.
*Plus* you keep all the context of your web pages, which is lost when you
dumb down to an .ics feed.
I think of this purely economically.
What's easiest/cheapest for you to provide with the maximum benefit?
The premise here is that if you simply markup your current event data in
your web page with hCalendar, then you get the .ics file for free via
> This is a *big deal* for developers, to *change* the way that they do
> things (look at how many people are using table-based layouts).
Right. Hence microformats tries to ask web designers/authors to change the
> the near future there will be too many formats to choose from (it
> could be that this is already happening).
It has. It is.
One of the reasons why when someone proposes a new microformat, the
community and process strongly discourages it until a *real-world*
*practical* need is shown.
>>> A possible issue to address would be: should I provide OPML or XOXO?
>> If you are publishing on the web, XOXO.
>> If not, use whatever outline format is the most convenient
>> (cheapest to implement) for you and your actual use cases.
Thanks Josh, I appreciate the points you have made, and I think that the
discussion has hopefully helped to illuminate some of the reasons why we are
doing what we are doing.
More information about the microformats-discuss