parsing-microformats

From Microformats Wiki
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Parsing Microformats

Microformat parsing mechanisms that depend on documents having even minimal xml properties like well-formedness may fail when consuming non-well-formed content. Tidy or even better CyberNeko may be a useful work around. In particular X2V uses XSLT, and tidy to clean any non-well-formed input before processing it.

Parsing class values

When parsing class values care must be taken:

  1. Class attributes may contain multiple class names, e.g: class="foo vcard bar"
  2. Class attributes may contain class names which contain the class name used by a microformat, e.g: class="foovcardbar" class="foovcard", class="vcardbar".
  3. Multiple class names are seperated by one or more whitespace charchters.
  4. Class names are case sensitive.

See http://www.w3.org/TR/html401/struct/global.html#h-7.5.2.

JavaScript example

The Ultimate getElementsByClassName JavaScript function may be useful. Then you can do:

var adrs = document.getElementsByClassName(document, "*", "adr");

or even:

var cities = document.getElementsByClassName(document, "*", "locality");

XSLT example

<xsl:if test="contains(
   concat (' ', normalize-space(@class),' '),
   ' vcard '
   )" > ...

xpath generator, to help you generate those long ugly xpath queries. [link broken as of 8 August 2006]

XQuery example

Also using XPath <div style="background-color:yellow;"> { for $a in doc()//div[@class='vcard'] let $b := $a/div[@class='fn org'] let $c := $a/div[@class='adr'] return ($b, $c,
) } </div>

For example, this could be used against http://technorati.com/about/contact.html. See Firefox extensions for getting XQuery in Firefox.

Note that simple XPath expressions can also be used.

Parsing rel/rev values

Parsing rel and rev values is similar to parsing class values except for the following differences:

  1. rel and rev values should be separated by one space.
  2. rel and rev values are case insensitive.

See http://www.w3.org/TR/html401/types.html#type-links.

See Also