parsing-microformats: Difference between revisions

From Microformats Wiki
Jump to navigation Jump to search
(Still a bug in the XPath, cleaning it up and making the structure more evident)
(+XQuery)
Line 36: Line 36:


[http://balloon.hobix.com/xpath-generator xpath generator], to help you generate those long ugly xpath queries. [link broken as of 8 August 2006]
[http://balloon.hobix.com/xpath-generator xpath generator], to help you generate those long ugly xpath queries. [link broken as of 8 August 2006]
=== XQuery example ===
Also using XPath
<code><div style="background-color:yellow;">
{
for $a in doc()//div[@class='vcard']
let $b := $a/div[@class='fn org']
let $c := $a/div[@class='adr']
return ($b, $c, <br />)
}
</div>
</code>
For example, this could be used against http://technorati.com/about/contact.html. See [[firefox-extensions#XqUSEme|Firefox extensions]] for getting XQuery in Firefox.
Note that simple XPath expressions can also be used.


== Parsing rel/rev values ==
== Parsing rel/rev values ==

Revision as of 13:29, 20 August 2008

Parsing Microformats

Microformat parsing mechanisms that depend on documents having even minimal xml properties like well-formedness may fail when consuming non-well-formed content. Tidy or even better CyberNeko may be a useful work around. In particular X2V uses XSLT, and tidy to clean any non-well-formed input before processing it.

Parsing class values

When parsing class values care must be taken:

  1. Class attributes may contain multiple class names, e.g: class="foo vcard bar"
  2. Class attributes may contain class names which contain the class name used by a microformat, e.g: class="foovcardbar" class="foovcard", class="vcardbar".
  3. Multiple class names are seperated by one or more whitespace charchters.
  4. Class names are case sensitive.

See http://www.w3.org/TR/html401/struct/global.html#h-7.5.2.

JavaScript example

The Ultimate getElementsByClassName JavaScript function may be useful. Then you can do:

var adrs = document.getElementsByClassName(document, "*", "adr");

or even:

var cities = document.getElementsByClassName(document, "*", "locality");

XSLT example

<xsl:if test="contains(
   concat (' ', normalize-space(@class),' '),
   ' vcard '
   )" > ...

xpath generator, to help you generate those long ugly xpath queries. [link broken as of 8 August 2006]

XQuery example

Also using XPath

{ for $a in doc()//div[@class='vcard'] let $b := $a/div[@class='fn org'] let $c := $a/div[@class='adr'] return ($b, $c,
) }

For example, this could be used against http://technorati.com/about/contact.html. See Firefox extensions for getting XQuery in Firefox.

Note that simple XPath expressions can also be used.

Parsing rel/rev values

Parsing rel and rev values is similar to parsing class values except for the following differences:

  1. rel and rev values should be separated by one space.
  2. rel and rev values are case insensitive.

See http://www.w3.org/TR/html401/types.html#type-links.

See Also