Difference between revisions of "parsing-microformats"

From Microformats Wiki
Jump to navigation Jump to search
(some editorial and formatting cleanup)
Line 29: Line 29:
 
=== XSLT example ===
 
=== XSLT example ===
 
<code>
 
<code>
&lt;xsl:if test="contains(
 
    concat (
 
        ' ',
 
        concat(normalize-whitespace(@class),' ')
 
    ),
 
    ' <strong>vcard</strong> '
 
  )" &gt; ...
 
</code>
 
 
[http://balloon.hobix.com/xpath-generator xpath generator], to help you generate those long ugly xpath queries. [link broken as of 8 August 2006]
 
 
== Parsing rel/rev values ==
 
 
Parsing rel and rev values is similar to parsing class values except for the following differences:
 
 
# rel and rev values should be separated by one space.
 
# rel and rev values are case insensitive.
 
 
See http://www.w3.org/TR/html401/types.html#type-links.
 
 
== See Also ==
 
 
* [[xmdp-brainstorming]]
 

Revision as of 13:55, 23 June 2007

Parsing Microformats

Microformat parsing mechanisms that depend on documents having even minimal xml properties like well-formedness may fail when consuming non-well-formed content. Tidy or even better CyberNeko may be a useful work around. In particular X2V uses XSLT, and tidy to clean any non-well-formed input before processing it.

Parsing class values

When parsing class values care must be taken:

  1. Class attributes may contain multiple class names, e.g: class="foo vcard bar"
  2. Class attributes may contain class names which contain the class name used by a microformat, e.g: class="foovcardbar" class="foovcard", class="vcardbar".
  3. Multiple class names are seperated by one or more whitespace charchters.
  4. Class names are case sensitive.

See http://www.w3.org/TR/html401/struct/global.html#h-7.5.2.

JavaScript example

The Ultimate getElementsByClassName JavaScript function may be useful. Then you can do:

var adrs = document.getElementsByClassName(document, "*", "adr");

or even:

var cities = document.getElementsByClassName(document, "*", "locality");

XSLT example