[uf-discuss] stats on well formed XHTML
ryan
ryan at theryanking.com
Wed Jan 16 15:04:38 PST 2008
On Jan 16, 2008, at 12:41 AM, Kevin Burton wrote:
> Has anyone done any large scale audits of XHTML in the wild to
> determine the percentage that parse correctly?
Yes, Ian Hickson at Google did a survey of about 1B pages and found
that over 90% had *well-formedness* errors. I can't find a reference
off hand, but it maybe buried somewhere in [#webstats].
> I'm thinking about deploying one in Spinn3r but I'd rather focus on
> other tasks if this has already been done.
I'd suggest working on other tasks. :)
> I'm curious about the assumptions one could make when assuming that
> XHTML is well formed.
You know what they say about assumptions.
> Specifically, the probability that a naive non-XML parser can make
> while indexing the content.
I'm not sure what you mean here, but I'd reccomend against using an
XML parser against web content and instead use something like the
HTML5 parsing algorithm [#html5-parsing].
-ryan
[webstats]: http://code.google.com/webstats/
[html5-parsing]: http://whatwg.org/specs/web-apps/current-work/#parsing
More information about the microformats-discuss
mailing list