Dan Libby <danda at videntity.org> wrote: 
> Finally, having gone down this path, I think that these formats can be a bit 
> difficult to parse given the varied nature of (often very broken) web pages 
> in the wild. I found that using HTML Tidy was necessary before I could even 
> begin to parse most documents.  This has likely already been discusssed, but 
> I think that some standard parsing classes in various languages could help 
> pave the way for other implementors.  If such exists for PHP, I would 
> appreciate a pointer.  If not, perhaps I can refactor my code so that it is 
> generic enough to be useful for others.

I don't think that we should place too much emphasis on fixing other peoples bad code.

Browsers have been having to deal with bad code for a long time now, and only recently with the move to XHTML and XML has a change begun towards good code.

If XHTML is to ever make its way to XML, then people are going to have to start using good code for it to translate across as they intended.

If sense can't be made of their code after passing it through HTML Tidy, they should be notified as such and told to come back when their code can be made sense of. 

If anything less is done than asking for understandable code, we would be actively supporting the idea of too much bad code.

[Rest of coding standard rant put on hold]

Paul Wilkins

