[uf-discuss] "Must Ignore vs. Microformats"

Wed Jul 19 10:55:17 PDT 2006

On 7/19/06 10:34 AM, "Charles Iliya Krempeaux" <supercanadian at gmail.com>
wrote:

> Hello,
> 
> On 7/19/06, Tantek Çelik <tantek at cs.stanford.edu> wrote:
>> On 7/19/06 8:37 AM, "Frances Berriman" <fberriman at gmail.com> wrote:
>> 
>>> http://cafe.elharo.com/xml/must-ignore-vs-microformats
>>> 
>>> A friend of mine showed me this today.  Macroformats, over Microformats.
>> 
>> The article is terrible and about 90% incorrect.  Unfortunately this is
>> perhaps in due in some part to the IBM article which though decent overall,
>> has some errors itself, and takes a walk through transcoding to XML and back
>> which is interesting but perhaps unnecessary.
>> 
>> The author of the "macroformats" article misses all the reasons that XML has
>> failed on the Web, and all the specific design principles that have gone
>> into microformats that were developed by learning from XML's failure.  In
>> fact, he continues to push several of these reasons as actual *plusses* for
>> XML (namespaces, invalidity, etc.)
>> 
>> There will continue to be plenty of folks banging there head against the
>> wall and trying to push "plain old xml" (POX) on the Web, and they will
>> likely continue to see the same amount of success as they have to date.
>> 
>> What we can do to be helpful:
>> 
>> 
>> 1. Dissect articles like this into a series of assertions/questions and put
>> them on the wiki, e.g.:
>> 
>> * "why would anyone write markup like this? It brings exactly nothing to the
>> table."
> 
> (Sorry to bring up a point for XML, but.... I know others will
> probably bring this up outside of here... so I might as well do it
> here....)
> 
> One "good" thing about XML, IMO, is that for certain simple markups
> based on XML, it's easier for a beginner-level or intermediate-level
> developer to write a parser for it (as compared to writing a parser
> for Micrformats... since HTML is more difficult to parse).
> 
> (For example, writing a parser in C, C++, PHP, Java, C# or whatever.)

I'll be perfectly frank.

This assumes that making it easy to write a parser is important.

That assumption is wrong.

Or, to put it more clearly, making such an assumption in a vacuum (which
many XML folks do) is wrong.

It is more important to make it easier to publish than it is to make it
easier to parse.

This is why the supposed "easier to parse" aspect of XML is incredibly
misleading.  It ignores both the need to be easier to publish, and the fact
that XML, in fact, is *harder* to publish.

> One example of such a simple format based on XML is RSS.
> 
> I'd say it is pretty easy for someone to write a parser for it since
> RSS is such a simple markup.  (Although, technically, their parser
> will probably be wrong and might choke and die if some fancy things
> are done with the XML... like using namespaces, adding DTD's, etc.)

You're kidding right?

RSS is the canonical example of an XML format gone wrong from the "purist"
standpoint, although it is the 2nd most popular XML format on the Web (after
XHTML).

Go look at any production RSS parser and understand its complexity.

It is certainly *not* pretty easy for someone to write a parser for RSS that
actually works with real RSS on the Web.

> OPML is probably another example too of a simple XML markup.

Not really.  OPML postpones the real parsing to what do the attributes mean.

> And yes, I know both formats have ALOT of problems.  But their
> simplicity (in that respect) helps bring on developer adoption.  (Or
> at least, helps bring on adoption by a certain kind of developer.)

With all due respect (and as a developer myself), the developers don't
matter as much as the publishers.

> Now, having said that, in other realms, Microformats are much much
> easier to parse.  (Like for in-browser technologies.  Like CSS
> styling, JavaScript manipulation, and user scripts.... like
> greasemonkey.)

Yes, the "microformats are hard to parse" misconception has been debunked
quite a bit by the creation of simple open source parsers for which this
community is to be commended.  You know who you are out there.

 http://microformats.org/wiki/implementations

> (I even have a PHP parser written that makes parsing Microformats and
> other kinds of semantic HTML dead easy... coming to you via LGPL
> eventually... once I improve the HTML-repairing part of it.  Gotta
> compile tidy and see if that can improve the HTML-repairing.)

Release early release often.  Even if it is not "done", I encourage you to
release it because you might get help from folks to finish it.

> So, maybe we should address that point to.  Maybe something like...
> 
> Q: But writing parsers for Microformats is hard in language X...
> A: You don't need to write a parser in language X, here's a list of
> some parsers....

Well said.

Thanks,

Tantek