microformats2-parsing

From Microformats Wiki
Revision as of 01:35, 16 October 2012 by Tantek (talk | contribs) (→‎see also: clarify brainstorming)
Jump to navigation Jump to search

<entry-title>microformats2 parsing</entry-title>

One of the goals of microformats2 is to greatly simplify parsing of microformats, in particular, by making parsing independent of any one vocabulary.

parsing algorithm

To parse an element for microformats:

  • parse element class for root class name(s) "h-*" (and backcompat)
    • if found, start parsing a new microformat
      • parse contained elements for properties (depth first, doc order)
        • parse an element for microformats (recurse)
      • imply properties (see below)
  • parse element class for properties (p-,dt-,u-,e-)
  • add properties found (with any nested microformats) to current microformat

parsing for implied properties

To imply properties: (where h-* is the root microformat element being parsed)

  • if no explicit "name" property,
  • then imply by:
    • if img.h-* then use its alt attribute for name
    • else if .h-*>img:only-node then use that img alt for name
    • else if .h-*>:only-node>img:only-node use that img alt for name
    • else use the innertext of the .h-* for name
    • drop leading & trailing white-space from name, including nbsp
  • if no explicit "photo" property,
  • then imply by:
    • if img.h-*[src] then use src for photo
    • else if .h-*>img[src]:only-of-type then use that img src for photo
    • else if .h-*>:only-child>img[src]:only-of-type then use that img src for photo
  • if no explicit "url" property,
  • then imply by:
    • if a.h-*[href] then use href for url
    • else if .h-*>a[href]:only-of-type then use that a[href] for url

see also