parsing-microformats: Difference between revisions

From Microformats Wiki
Jump to navigation Jump to search
(linked to tidy)
m (Added CyberNeko link http://people.apache.org/~andyc/neko/doc/html/)
Line 1: Line 1:
= Microformat Parsing =
= Microformat Parsing =


Microformat parsing mechanisms that depend on documents having even minimal xml properties like well-formedness may fail when consuming non-well-formed content.  [http://tidy.sourceforge.net/ Tidy] may be a useful work around.
Microformat parsing mechanisms that depend on documents having even minimal xml properties like well-formedness may fail when consuming non-well-formed content.  [http://tidy.sourceforge.net/ Tidy] or even better [http://people.apache.org/~andyc/neko/doc/html/ CyberNeko] may be a useful work around.
In particular  [http://suda.co.uk/projects/X2V/ Brian Suda's frequently cited X2V hCard and hCalendar discovery and transformation prototypes] use XSLT, and "tidy" any non-well-formed input before processing it.
In particular  [http://suda.co.uk/projects/X2V/ Brian Suda's frequently cited X2V hCard and hCalendar discovery and transformation prototypes] use XSLT, and "tidy" any non-well-formed input before processing it.



Revision as of 12:08, 27 October 2005

Microformat Parsing

Microformat parsing mechanisms that depend on documents having even minimal xml properties like well-formedness may fail when consuming non-well-formed content. Tidy or even better CyberNeko may be a useful work around. In particular Brian Suda's frequently cited X2V hCard and hCalendar discovery and transformation prototypes use XSLT, and "tidy" any non-well-formed input before processing it.

Most microformats tend to be agnostic about things like exact element type used.

Developers can use tools like XPATH that assume well-formedness on well-formed content (from the web or by using tidy). Mark Pilgrim's example universal feed parser suggests that it may be possible to sanitize user html to an extent that it is suitable for later processing as xml.

See Also