parsing-microformats: Difference between revisions
BrettZamir (talk | contribs) m (→XQuery example) |
BrettZamir (talk | contribs) |
||
Line 39: | Line 39: | ||
=== XQuery example === | === XQuery example === | ||
Also using XPath | Also using XPath... | ||
< | <pre><div style="background-color:yellow;"> | ||
{ | { | ||
for $a in doc()//div[@class='vcard'] | for $a in doc()//div[@class='vcard'] | ||
let $b := $a/div[@class='fn org'] | let $b := $a/div[@class='fn org'] | ||
let $c := $a/div[@class='adr'] | let $c := $a/div[@class='adr'] | ||
return ($b, $c, <br />) | return ($b, $c, <br />) | ||
} | } | ||
</div> | </div> | ||
</ | </pre> | ||
For example, this could be used against http://technorati.com/about/contact.html. See [[firefox-extensions#XqUSEme|Firefox extensions]] for getting XQuery in Firefox. | For example, this could be used against http://technorati.com/about/contact.html. See [[firefox-extensions#XqUSEme|Firefox extensions]] for getting XQuery in Firefox. | ||
Simple XPath expressions can also be used. | |||
== Parsing rel/rev values == | == Parsing rel/rev values == |
Revision as of 13:30, 20 August 2008
Parsing Microformats
Microformat parsing mechanisms that depend on documents having even minimal xml properties like well-formedness may fail when consuming non-well-formed content. Tidy or even better CyberNeko may be a useful work around. In particular X2V uses XSLT, and tidy to clean any non-well-formed input before processing it.
Parsing class values
When parsing class values care must be taken:
- Class attributes may contain multiple class names, e.g:
class="foo vcard bar"
- Class attributes may contain class names which contain the class name used by a microformat, e.g:
class="foovcardbar"
class="foovcard"
,class="vcardbar"
. - Multiple class names are seperated by one or more whitespace charchters.
- Class names are case sensitive.
See http://www.w3.org/TR/html401/struct/global.html#h-7.5.2.
JavaScript example
The Ultimate getElementsByClassName JavaScript function may be useful. Then you can do:
var adrs = document.getElementsByClassName(document, "*", "adr");
or even:
var cities = document.getElementsByClassName(document, "*", "locality");
XSLT example
<xsl:if test="contains(
concat (' ', normalize-space(@class),' '),
' vcard '
)" > ...
xpath generator, to help you generate those long ugly xpath queries. [link broken as of 8 August 2006]
XQuery example
Also using XPath...
<div style="background-color:yellow;"> { for $a in doc()//div[@class='vcard'] let $b := $a/div[@class='fn org'] let $c := $a/div[@class='adr'] return ($b, $c, <br />) } </div>
For example, this could be used against http://technorati.com/about/contact.html. See Firefox extensions for getting XQuery in Firefox.
Simple XPath expressions can also be used.
Parsing rel/rev values
Parsing rel and rev values is similar to parsing class values except for the following differences:
- rel and rev values should be separated by one space.
- rel and rev values are case insensitive.
See http://www.w3.org/TR/html401/types.html#type-links.