[uf-discuss] Automated microformat parsing using XPath

Wed Aug 9 14:24:04 PDT 2006

There are several things to look out for... i'll answer a few, then
suggest we move this to the mf-dev list if there are more specific
questions.

1) the portion of the XPATH: contains(@class, 'description')  will fail
if there is 'descriptions' (plural) because this is only looking for the
string CONTAINED in the @class, you will need to expand that to
something like: contains(concat(' ', normalize-space(@class), ' '),'
description ') This pads both sides with spaces and then searches for
the term also padded with spaces.

2) Depending on both the microformat property (URL, UID, etc) you will
look in different places,
if node() = 'a' and @class='url' then
  // look on the @href
end if

you will also need to consider data that is found on the ABBR attribute.
If there is a microformat property and it is on an ABBR element, then
values is extracted from the @title.

We have a repository of XSLT code, which has many working XPATHs already
written, feel free to browse them at http://hg.microformats.org/

If you are already not part of the mf-dev list, an administrator will
have to add you.

-brian

Matt Augustine wrote:
> I have written simple parsers for hCard and hCal in javascript that use
> XPath to parse the microformat properties from an arbitrary xhtml
> document.  In general, for each known property I have code like this:
>
> node = document.evaluate("//*[contains(@class,
> 'vevent')]//*[contains(@class, 'description')]", hCalXmlNode, null, 0
> /*XPathResult.ANY_TYPE*/, null).iterateNext();
>
> if (node) {self.Description = node.textContent;}
>
> This works great in most cases, but I'm having trouble with the case
> where the exact location of the data (which attribute, inner element
> etc.) is unknown.  For example, UIDs might be represented as:
>
> <a rel="contact friend" class="url uid fn"
> href="http://beta.plazes.com/plaze/cd21e1717f61ba9cf9df9006038da172/">fi
> ahless</a>
>
> How would I parse the value without special casing to look in the href
> attribute if the containing element is an <a>?  An XPath expression like
> the one above would yield "fiahless" instead of
> ="http://beta.plazes.com/plaze/cd21e1717f61ba9cf9df9006038da172/".
>
>
> Matt Augustine
> _______________________________________________
> microformats-discuss mailing list
> microformats-discuss at microformats.org
> http://microformats.org/mailman/listinfo/microformats-discuss
>
>