[uf-discuss] uF dumped in tag soup?

Joe Andrieu joe at andrieu.net
Mon Jun 18 04:38:45 PDT 2007

David Thompson wrote:
> Sent: Monday, June 18, 2007 3:26 AM
> To: Microformats Discuss
> Subject: Re: [uf-discuss] uF dumped in tag soup?
> Ben Wiley Sittler wrote:
> > i agree, but you should be aware that microformats are only 
> specified 
> > to work in xhtml pages (so e.g. it needs parse as xml), not in html 
> > pages
> Is this actually the case? The wiki seems to be a little ambiguous on 
> the topic, sometimes referring to "open microformat standards 
> suitable 
> for embedding in (X)HTML." and sometimes referring to "a 1:1 
> representation [.] in semantic XHTML".
> Granted, the behaviour can't be guaranteed in the case of a 
> non-compliant document, but are Microformats actually specified as 
> working in valid, but non-XML-compliant, HTML markup? Surely the only 
> requirement should be that the markup can be parsed 
> unambiguously into a 
> DOM tree?
> (Hi everyone, by the way.)

Hi David.

I believe that the problem is that more than a few of the parsers use XSLT operating on the file itself, rather than a DOM. Relying
on a browser to parse the (X)HTML into a DOM is convenient, but it is also expensive architecturally, especially when doing
server-side processing that may not have a browser in process.  XSLT is relatively fast and lightweight, if you have valid XML as
input and it is notoriously unforgiving.

So, I believe that valid HTML that is not valid XHTML is non-compliant with uF. I expect that some of the tools work if the uF
sections are XML compliant despite errors elsewhere, but I can't be certain of that.

As Ciaran mentions, many parsers use Tidy first, but the spec certainly doesn't state that your uF should be valid "after it is
processed by Tidy." It should be valid first. Tidy is just a nice cleaner-upper to provide liberal interpretation at parse time.
And, in fact, I think it has a few known problems.

Ciaran McNulty wrote:
> Most proxy-type services seem to run everything through Tidy anyhow.
> I can't think of any uFs I'm aware of that wouldn't 'work' in 
> HTML, although some of the nesting features wouldn't work if 
> people did (for
> instance):
> <p class="vcard"><em class="fn">Ciaran</p>
> <p>McNulty</em></p>
> (which I think is valid HTML)

FWIW, that isn't valid HTML. Overlapping containers may display in most web browsers, but it is contrary to spec.  I would expect
that most DOMs effectively duplicate the <em> and </em> tags to create two distinct <em> elements, one with "Ciaran" and one with



Joe Andrieu
SwitchBook Software
joe at switchbook.com
+1 (805) 705-8651 

More information about the microformats-discuss mailing list