microformats2-parsing: Difference between revisions

From Microformats Wiki
Jump to navigation Jump to search
(→‎see also: clarify brainstorming)
(parse a document vs parse an element)
Line 4: Line 4:


== parsing algorithm ==
== parsing algorithm ==
=== parse a document for microformats ===
To parse a document for microformats:
* start with an empty JSON items array
* parse the root element for microformats
=== parse an element for microformats ===
To parse an element for microformats:
To parse an element for microformats:
* parse element class for root class name(s) "h-*" (and backcompat)
* parse element class for root class name(s) "h-*" (and backcompat)

Revision as of 01:37, 16 October 2012

<entry-title>microformats2 parsing</entry-title>

One of the goals of microformats2 is to greatly simplify parsing of microformats, in particular, by making parsing independent of any one vocabulary.

parsing algorithm

parse a document for microformats

To parse a document for microformats:

  • start with an empty JSON items array
  • parse the root element for microformats

parse an element for microformats

To parse an element for microformats:

  • parse element class for root class name(s) "h-*" (and backcompat)
    • if found, start parsing a new microformat
      • parse contained elements for properties (depth first, doc order)
        • parse an element for microformats (recurse)
      • imply properties (see below)
  • parse element class for properties (p-,dt-,u-,e-)
  • add properties found (with any nested microformats) to current microformat

parsing for implied properties

To imply properties: (where h-* is the root microformat element being parsed)

  • if no explicit "name" property,
  • then imply by:
    • if img.h-* then use its alt attribute for name
    • else if .h-*>img:only-node then use that img alt for name
    • else if .h-*>:only-node>img:only-node use that img alt for name
    • else use the innertext of the .h-* for name
    • drop leading & trailing white-space from name, including nbsp
  • if no explicit "photo" property,
  • then imply by:
    • if img.h-*[src] then use src for photo
    • else if .h-*>img[src]:only-of-type then use that img src for photo
    • else if .h-*>:only-child>img[src]:only-of-type then use that img src for photo
  • if no explicit "url" property,
  • then imply by:
    • if a.h-*[href] then use href for url
    • else if .h-*>a[href]:only-of-type then use that a[href] for url

see also