[uf-discuss] uF dumped in tag soup?

Mon Jun 18 07:09:06 PDT 2007

David Thompson
> Sent: Monday, June 18, 2007 6:41 AM
> To: Microformats Discuss
> Subject: Re: [uf-discuss] uF dumped in tag soup?
> 
> 
> Joe Andrieu wrote:
> > I believe that the problem is that more than a few of the 
> parsers use 
> > XSLT operating on the file itself, rather than a DOM. Relying on a 
> > browser to parse the (X)HTML into a DOM is convenient, but 
> it is also 
> > expensive architecturally, especially when doing server-side 
> > processing that may not have a browser in process.  XSLT is 
> relatively 
> > fast and lightweight, if you have valid XML as input and it is 
> > notoriously unforgiving.
> > 
> > So, I believe that valid HTML that is not valid XHTML is 
> non-compliant 
> > with uF. I expect that some of the tools work if the uF 
> sections are 
> > XML compliant despite errors elsewhere, but I can't be certain of 
> > that.
> 
> Maybe I'm misunderstanding here, but isn't this approach 
> contrary to the 
> point of Microformats (namely, to make data easy to publish 
> on the web)? 
> Given that an overwhelming proportion of the pages around on 
> the web are 
> either served up as HTML (valid or otherwise) or invalid XHTML, 
> restricting Microformats to those pages which are valid XHTML for the 
> sake of easy parsing (as an XSLT-only parser surely would) seems to 
> directly contravene the "humans first, machines second" principle.

That is a good point.  We should be supporting valid HTML, and I think we do, generally. A quick review of the website suggests we
mean to include HTML 4.01 when we say (X)HTML.

However, supporting invalid HTML just because it happens to work in browsers... That is a harder challenge. Since we can't control
how the DOM is going to resolve with malformed HTML, we can't actually write code for it unless we special case for every possible
browser. Which is not really what uF is about. That's more of an implementation detail than a standards issue.

If people want uF to work, I think it is reasonable to expect them to also write valid HTML or XHTML. Restricting to valid HTML and
XHTML is, IMO, the only reasonable focus for our efforts, otherwise we have very few bounds by which to decide what "bad" HTML we
will support. 

However, it seems the possible distinction between support for HTML and XHTML is poorly documented. XOXO, for example, claims [1]:
--
XOXO may be published in two forms, valid XHTML, and simple well-formed XML.
--

And yet, on that same page, it suggest (X)HTML... Which is probably just sloppy editing as most of the document seems to be clear
that XML rather than HTML is the foundation.

[1]http://microformats.org/wiki/xoxo#Publishing_XOXO

Is XOXO really invalid if it is HTML and not XHTML? Are there any uF for which that is true?

-j

--
Joe Andrieu
SwitchBook Software
http://www.switchbook.com
joe at switchbook.com
+1 (805) 705-8651