microformats2-parsing: Difference between revisions
Jump to navigation
Jump to search
(→see also: clarify brainstorming) |
(parse a document vs parse an element) |
||
Line 4: | Line 4: | ||
== parsing algorithm == | == parsing algorithm == | ||
=== parse a document for microformats === | |||
To parse a document for microformats: | |||
* start with an empty JSON items array | |||
* parse the root element for microformats | |||
=== parse an element for microformats === | |||
To parse an element for microformats: | To parse an element for microformats: | ||
* parse element class for root class name(s) "h-*" (and backcompat) | * parse element class for root class name(s) "h-*" (and backcompat) |
Revision as of 01:37, 16 October 2012
<entry-title>microformats2 parsing</entry-title>
One of the goals of microformats2 is to greatly simplify parsing of microformats, in particular, by making parsing independent of any one vocabulary.
parsing algorithm
parse a document for microformats
To parse a document for microformats:
- start with an empty JSON items array
- parse the root element for microformats
parse an element for microformats
To parse an element for microformats:
- parse element class for root class name(s) "h-*" (and backcompat)
- if found, start parsing a new microformat
- parse contained elements for properties (depth first, doc order)
- parse an element for microformats (recurse)
- imply properties (see below)
- parse contained elements for properties (depth first, doc order)
- if found, start parsing a new microformat
- parse element class for properties (p-,dt-,u-,e-)
- add properties found (with any nested microformats) to current microformat
parsing for implied properties
To imply properties: (where h-* is the root microformat element being parsed)
- if no explicit "name" property,
- then imply by:
- if img.h-* then use its alt attribute for name
- else if .h-*>img:only-node then use that img alt for name
- else if .h-*>:only-node>img:only-node use that img alt for name
- else use the innertext of the .h-* for name
- drop leading & trailing white-space from name, including nbsp
- if no explicit "photo" property,
- then imply by:
- if img.h-*[src] then use src for photo
- else if .h-*>img[src]:only-of-type then use that img src for photo
- else if .h-*>:only-child>img[src]:only-of-type then use that img src for photo
- if no explicit "url" property,
- then imply by:
- if a.h-*[href] then use href for url
- else if .h-*>a[href]:only-of-type then use that a[href] for url
see also
- microformats2
- microformats2-implied-properties
- microformats2-parsing-brainstorming - for background, thinking, exploring possibilities