microformats2-parsing: Difference between revisions

Revision as of 23:39, 16 October 2012

<entry-title>microformats2 parsing</entry-title>

One of the goals of microformats2 is to greatly simplify parsing of microformats, in particular, by making parsing independent of any one vocabulary.

parsing algorithm

parse a document for microformats

To parse a document for microformats:

start with an empty JSON items array
parse the root element for microformats

parse an element for microformats

To parse an element for microformats:

parse element class for root class name(s) "h-x" (and backcompat)
- if found, start parsing a new microformat
  - parse contained elements for properties (depth first, doc order)
    - parse an element for microformats (recurse)
  - imply properties (see below)
parse element class for properties (p-,dt-,u-,e-)
add properties found (with any nested microformats) to current microformat

parsing a p- property

To parse an element for a p-x property value:

parse the element for the value-class-pattern, if a value is found then return it.
if abbr.p-x[title], then return the title attribute
else if data.p-x[value], then return the value attribute
else if br.p-x or hr.p-x, then return "" (empty string)
else if img.p-x[alt] or area.p-x[alt], then return the alt attribute
else return the innertext of the element.

parsing a u- property

To parse an element for a u-x property value:

parse the element for the value-class-pattern, if a value is found then return it.
if a.u-x[href] or area.u-x[href], then get the href attribute
else if img.u-x[src], then get the src attribute
else if object.u-x[data], then get the data attribute
if there is a gotten value, return the normalized absolute URL of it, following the containing document's language's rules for resolving relative URLs.
else if abbr.u-x[title], then return the title attribute
else if data.u-x[value], then return the value attribute
else return the innertext of the element.

parsing a dt- property

To parse an element for a dt-x property value:

parse the element for the value-class-pattern including the date and time parsing rules, if a value is found then return it.
if time.dt-x[datetime] or ins.dt-x[datetime] or del.dt-x[datetime], then return the datetime attribute
else if abbr.dt-x[title], then return the title attribute
else if data.dt-x[value], then return the value attribute
else return the innertext of the element.

parsing an e- property

To parse an element for a e-x property value:

return the innerHTML of the element by using the HTML spec: Serializing HTML Fragments algorithm.

parsing for implied properties

To imply properties: (where h-x is the root microformat element being parsed)

if no explicit "name" property,
then imply by:
- if img.h-x then use its alt attribute for name
- else if .h-x>img:only-node then use that img alt for name
- else if .h-x>:only-node>img:only-node use that img alt for name
- else use the innertext of the .h-x for name
- drop leading & trailing white-space from name, including nbsp
if no explicit "photo" property,
then imply by:
- if img.h-x[src] then use src for photo
- else if .h-x>img[src]:only-of-type then use that img src for photo
- else if .h-x>:only-child>img[src]:only-of-type then use that img src for photo
if no explicit "url" property,
then imply by:
- if a.h-x[href] then use href for url
- else if .h-x>a[href]:only-of-type then use that a[href] for url

what do the CSS selector expressions mean

Use SelectORacle to expand any of the above CSS selector expressions into longform English prose.

@@ Line 12: / Line 12: @@
 === parse an element for microformats ===
 To parse an element for microformats:
-* parse element class for root class name(s) "h-*" (and backcompat)
+* parse element class for root class name(s) "h-x" (and backcompat)
 ** if found, start parsing a new microformat
 *** parse contained elements for properties (depth first, doc order)
@@ Line 19: / Line 19: @@
 * parse element class for properties (p-,dt-,u-,e-)
 * add properties found (with any nested microformats) to current microformat
+==== parsing a p- property ====
+To parse an element for a p-x property value:
+* parse the element for the [[value-class-pattern]], if a value is found then return it.
+* if abbr.p-x[title], then return the title attribute
+* else if data.p-x[value], then return the value attribute
+* else if br.p-x or hr.p-x, then return "" (empty string)
+* else if img.p-x[alt] or area.p-x[alt], then return the alt attribute
+* else return the innertext of the element.
+==== parsing a u- property ====
+To parse an element for a u-x property value:
+* parse the element for the [[value-class-pattern]], if a value is found then return it.
+* if a.u-x[href] or area.u-x[href], then get the href attribute
+* else if img.u-x[src], then get the src attribute
+* else if object.u-x[data], then get the data attribute
+* if there is a gotten value, return the normalized absolute URL of it, following the containing document's language's rules for resolving relative URLs.
+* else if abbr.u-x[title], then return the title attribute
+* else if data.u-x[value], then return the value attribute
+* else return the innertext of the element.
+==== parsing a dt- property ====
+To parse an element for a dt-x property value:
+* parse the element for the [[value-class-pattern]] including the date and time parsing rules, if a value is found then return it.
+* if time.dt-x[datetime] or ins.dt-x[datetime] or del.dt-x[datetime], then return the datetime attribute
+* else if abbr.dt-x[title], then return the title attribute
+* else if data.dt-x[value], then return the value attribute
+* else return the innertext of the element.
+==== parsing an e- property ====
+To parse an element for a e-x property value:
+* return the innerHTML of the element by using the [http://www.whatwg.org/specs/web-apps/current-work/multipage/the-end.html#serializing-html-fragments HTML spec: Serializing HTML Fragments algorithm].
 === parsing for implied properties ===
-To imply properties: (where h-* is the root microformat element being parsed)
+To imply properties: (where h-x is the root microformat element being parsed)
 * if no explicit "name" property,
 * then imply by:
-** if img.h-* then use its alt attribute for name
+** if img.h-x then use its alt attribute for name
-** else if .h-*>img:only-node then use that img alt for name
+** else if .h-x>img:only-node then use that img alt for name
-** else if .h-*>:only-node>img:only-node use that img alt for name
+** else if .h-x>:only-node>img:only-node use that img alt for name
-** else use the innertext of the .h-* for name
+** else use the innertext of the .h-x for name
 ** drop leading & trailing white-space from name, including nbsp
 * if no explicit "photo" property,
 * then imply by:
-** if img.h-*[src] then use src for photo
+** if img.h-x[src] then use src for photo
-** else if .h-*>img[src]:only-of-type then use that img src for photo
+** else if .h-x>img[src]:only-of-type then use that img src for photo
-** else if .h-*>:only-child>img[src]:only-of-type then use that img src for photo
+** else if .h-x>:only-child>img[src]:only-of-type then use that img src for photo
 * if no explicit "url" property,
 * then imply by:
-** if a.h-*[href] then use href for url
+** if a.h-x[href] then use href for url
-** else if .h-*>a[href]:only-of-type then use that a[href] for url
+** else if .h-x>a[href]:only-of-type then use that a[href] for url
+== what do the CSS selector expressions mean ==
+Use [http://gallery.theopalgroup.com/selectoracle/ SelectORacle] to expand any of the above CSS selector expressions into longform English prose.
 == see also ==

microformats2-parsing: Difference between revisions

Revision as of 23:39, 16 October 2012

Contents

parsing algorithm

parse a document for microformats

parse an element for microformats

parsing a p- property

parsing a u- property

parsing a dt- property

parsing an e- property

parsing for implied properties

what do the CSS selector expressions mean

see also

Navigation menu

microformats2-parsing: Difference between revisions

Revision as of 23:39, 16 October 2012

parsing algorithm

parse a document for microformats

parse an element for microformats

parsing a p- property

parsing a u- property

parsing a dt- property

parsing an e- property

parsing for implied properties

what do the CSS selector expressions mean

see also

Navigation menu

Search