microformats2-parsing: Difference between revisions
Jump to navigation
Jump to search
(microformats2-parsing moved to microformats2-parsing-brainstorming: This is more of a background/brainstorming/about page than a concise description of "how to parse microformats2" so let's move it there and create a simple "how to parse" page for) |
(draft specific algorithm for implying properties name, photo, url) |
||
Line 1: | Line 1: | ||
<entry-title>microformats2 parsing</entry-title> | |||
One of the goals of [[microformats2]] is to greatly simplify parsing of microformats, in particular, by making parsing independent of any one vocabulary. | |||
== parsing algorithm == | |||
To parse an element for microformats: | |||
* parse element class for root class name(s) "h-*" (and backcompat) | |||
** if found, start parsing a new microformat | |||
*** parse contained elements for properties (depth first, doc order) | |||
**** parse an element for microformats (recurse) | |||
*** imply properties (see below) | |||
* parse element class for properties (p-,dt-,u-,e-) | |||
* add properties found (with any nested microformats) to current microformat | |||
=== parsing for implied properties === | |||
To imply properties: (where h-* is the root microformat element being parsed) | |||
* if no explicit "name" property, | |||
* then imply by: | |||
** if img.h-* then use its alt attribute for name | |||
** else if .h-*>img:only-node then use that img alt for name | |||
** else if .h-*>:only-node>img:only-node use that img alt for name | |||
** else use the innertext of the .h-* for name | |||
** drop leading & trailing white-space from name, including nbsp | |||
* if no explicit "photo" property, | |||
* then imply by: | |||
** if img.h-*[src] then use src for photo | |||
** else if .h-*>img[src]:only-of-type then use that img src for photo | |||
** else if .h-*>:only-child>img[src]:only-of-type then use that img src for photo | |||
* if no explicit "url" property, | |||
* then imply by: | |||
** if a.h-*[href] then use href for url | |||
** else if .h-*>a[href]:only-of-type then use that a[href] for url | |||
== see also == | |||
* [[microformats2]] | |||
* [[microformats2-implied-properties]] | |||
* [[microformats2-parsing-brainstorming]] |
Revision as of 01:30, 16 October 2012
<entry-title>microformats2 parsing</entry-title>
One of the goals of microformats2 is to greatly simplify parsing of microformats, in particular, by making parsing independent of any one vocabulary.
parsing algorithm
To parse an element for microformats:
- parse element class for root class name(s) "h-*" (and backcompat)
- if found, start parsing a new microformat
- parse contained elements for properties (depth first, doc order)
- parse an element for microformats (recurse)
- imply properties (see below)
- parse contained elements for properties (depth first, doc order)
- if found, start parsing a new microformat
- parse element class for properties (p-,dt-,u-,e-)
- add properties found (with any nested microformats) to current microformat
parsing for implied properties
To imply properties: (where h-* is the root microformat element being parsed)
- if no explicit "name" property,
- then imply by:
- if img.h-* then use its alt attribute for name
- else if .h-*>img:only-node then use that img alt for name
- else if .h-*>:only-node>img:only-node use that img alt for name
- else use the innertext of the .h-* for name
- drop leading & trailing white-space from name, including nbsp
- if no explicit "photo" property,
- then imply by:
- if img.h-*[src] then use src for photo
- else if .h-*>img[src]:only-of-type then use that img src for photo
- else if .h-*>:only-child>img[src]:only-of-type then use that img src for photo
- if no explicit "url" property,
- then imply by:
- if a.h-*[href] then use href for url
- else if .h-*>a[href]:only-of-type then use that a[href] for url