microformats2-parsing: Difference between revisions

Revision as of 01:37, 16 October 2012

<entry-title>microformats2 parsing</entry-title>

One of the goals of microformats2 is to greatly simplify parsing of microformats, in particular, by making parsing independent of any one vocabulary.

To parse a document for microformats:

To parse an element for microformats:

parse element class for root class name(s) "h-*" (and backcompat)
- if found, start parsing a new microformat
  - parse contained elements for properties (depth first, doc order)
    - parse an element for microformats (recurse)
  - imply properties (see below)
parse element class for properties (p-,dt-,u-,e-)
add properties found (with any nested microformats) to current microformat

To imply properties: (where h-* is the root microformat element being parsed)

if no explicit "name" property,
then imply by:
- if img.h-* then use its alt attribute for name
- else if .h-*>img:only-node then use that img alt for name
- else if .h-*>:only-node>img:only-node use that img alt for name
- else use the innertext of the .h-* for name
- drop leading & trailing white-space from name, including nbsp
if no explicit "photo" property,
then imply by:
- if img.h-*[src] then use src for photo
- else if .h-*>img[src]:only-of-type then use that img src for photo
- else if .h-*>:only-child>img[src]:only-of-type then use that img src for photo
if no explicit "url" property,
then imply by:
- if a.h-*[href] then use href for url
- else if .h-*>a[href]:only-of-type then use that a[href] for url

@@ Line 4: / Line 4: @@
 == parsing algorithm ==
+=== parse a document for microformats ===
+To parse a document for microformats:
+* start with an empty JSON items array
+* parse the root element for microformats
+=== parse an element for microformats ===
 To parse an element for microformats:
 * parse element class for root class name(s) "h-*" (and backcompat)