Revision as of 00:18, 12 May 2007

Parsing Microformats

Microformat parsing mechanisms that depend on documents having even minimal xml properties like well-formedness may fail when consuming non-well-formed content. Tidy or even better CyberNeko may be a useful work around. In particular X2V uses XSLT, and tidy to clean any non-well-formed input before processing it.

Parsing class values

When parsing class values care must be taken:

Class attributes may contain multiple class names, e.g: class="foo vcard bar"
Class attributes may contain class names which contain the class name used by a microformat, e.g: class="foovcardbar" class="foovcard", class="vcardbar".
Multiple class names are seperated by one or more whitespace charchters.
Class names are case sensitive.

See http://www.w3.org/TR/html401/struct/global.html#h-7.5.2.

JavaScript example

The Ultimate getElementsByClassName JavaScript function may be useful. Then you can do:

var adrs = document.getElementsByClassName(document, "*", "adr");

or even:

var cities = document.getElementsByClassName(document, "*", "locality");

XSLT example

<xsl:if test="contains(
   concat (
       ' ',
       concat(normalize-whitespace(@class),' ')
   ),
   ' vcard '
 )" > ...

xpath generator, to help you generate those long ugly xpath queries. [link broken as of 8 August 2006]

Parsing rel/rev values

Parsing rel and rev values is similar to parsing class values except for the following differences:

rel and rev values should be separated by one space.
rel and rev values are case insensitive.

See http://www.w3.org/TR/html401/types.html#type-links.

@@ Line 1: / Line 1: @@
-= Microformat Parsing =
+= Parsing Microformats =
 Microformat parsing mechanisms that depend on documents having even minimal xml properties like well-formedness may fail when consuming non-well-formed content.  [http://tidy.sourceforge.net/ Tidy] or even better [http://people.apache.org/~andyc/neko/doc/html/ CyberNeko] may be a useful work around.
-In particular  [http://suda.co.uk/projects/X2V/ Brian Suda's frequently cited X2V hCard and hCalendar discovery and transformation prototypes] use XSLT, and "tidy" any non-well-formed input before processing it.
+In particular  [http://suda.co.uk/projects/X2V/ X2V] uses XSLT, and [http://tidy.sourceforge.net/ tidy] to clean any non-well-formed input before processing it.
-Most microformats tend to be agnostic about things like exact element type used.
-Developers can use tools like XPATH that assume well-formedness on well-formed content (from the web or by using tidy).  Mark Pilgrim's example [http://sourceforge.net/projects/feedparser/ universal feed parser] suggests that it may be possible to sanitize user html to an extent that it is suitable for later processing as xml.
 == Parsing class values ==

parsing-microformats: Difference between revisions

Revision as of 00:18, 12 May 2007

Contents

Parsing Microformats

Parsing class values

JavaScript example

XSLT example

Parsing rel/rev values

See Also

Navigation menu

parsing-microformats: Difference between revisions

Revision as of 00:18, 12 May 2007

Parsing Microformats

Parsing class values

JavaScript example

XSLT example

Parsing rel/rev values

See Also

Navigation menu

Search