[uf-discuss] stats on well formed XHTML

Thu Jan 17 07:08:06 PST 2008

On 17 Jan 2008, at 01:44, Kevin Burton wrote:

>>> Specifically, the probability that a naive non-XML parser can make
>>> while indexing the content.
>>
>> I'm not sure what you mean here, but I'd reccomend against using an
>> XML parser against web content and instead use something like the
>> HTML5 parsing algorithm [#html5-parsing].
>
> Yes... I'm just trying to avoid using a full HTML parser (DOM or not)
> to avoid garbage generation and processor overhead.
>
> However, I think I'm losing that battle.

Once you start dealing with the joy of DOCTYPEs and the like, it  
becomes rather questionable whether XML parsers really are much  
simpler than HTML ones.

--
Geoffrey Sneddon