parsing-microformats

2005-10-27T12:08:37Z

MaxVoelkel: Added CyberNeko link http://people.apache.org/~andyc/neko/doc/html/

= Microformat Parsing =

Microformat parsing mechanisms that depend on documents having even minimal xml properties like well-formedness may fail when consuming non-well-formed content. [http://tidy.sourceforge.net/ Tidy] or even better [http://people.apache.org/~andyc/neko/doc/html/ CyberNeko] may be a useful work around.
In particular [http://suda.co.uk/projects/X2V/ Brian Suda's frequently cited X2V hCard and hCalendar discovery and transformation prototypes] use XSLT, and "tidy" any non-well-formed input before processing it.

Most microformats tend to be agnostic about things like exact element type used.

Developers can use tools like XPATH that assume well-formedness on well-formed content (from the web or by using tidy). Mark Pilgrim's example [http://sourceforge.net/projects/feedparser/ universal feed parser] suggests that it may be possible to sanitize user html to an extent that it is suitable for later processing as xml.

==== See Also ====

* [[xmdp-brainstorming]]

Microformats Wiki - User contributions [en]

parsing-microformats