microformats2-parsing
Revision as of 23:38, 19 November 2012 by TomMorris (talk | contribs) (→see also: adding microformats2-parsing-rdf)
<entry-title>microformats2 parsing</entry-title>
One of the goals of microformats2 is to greatly simplify parsing of microformats, in particular, by making parsing independent of any one vocabulary. This page briefly documents the microformats2 parsing algorithm for doing so.
parsing algorithm
parse a document for microformats
To parse a document for microformats:
- start with an empty JSON items array
- parse the root element for microformats
parse an element for microformats
To parse an element for microformats:
- parse element class for root class name(s) "h-x" (and backcompat)
- if found, start parsing a new microformat
- parse contained elements for properties (depth first, doc order)
- parse an element for microformats (recurse)
- imply properties (see below)
- parse contained elements for properties (depth first, doc order)
- if found, start parsing a new microformat
- parse element class for properties (p-,u-,dt-,e-)
- add properties found (with any nested microformats) to current microformat
parsing a p- property
To parse an element for a p-x property value:
- parse the element for the value-class-pattern, if a value is found then return it.
- if abbr.p-x[title], then return the title attribute
- else if data.p-x[value], then return the value attribute
- else if br.p-x or hr.p-x, then return "" (empty string)
- else if img.p-x[alt] or area.p-x[alt], then return the alt attribute
- else return the innertext of the element.
parsing a u- property
To parse an element for a u-x property value:
- parse the element for the value-class-pattern, if a value is found then return it.
- if a.u-x[href] or area.u-x[href], then get the href attribute
- else if img.u-x[src], then get the src attribute
- else if object.u-x[data], then get the data attribute
- if there is a gotten value, return the normalized absolute URL of it, following the containing document's language's rules for resolving relative URLs.
- else if abbr.u-x[title], then return the title attribute
- else if data.u-x[value], then return the value attribute
- else return the innertext of the element.
parsing a dt- property
To parse an element for a dt-x property value:
- parse the element for the value-class-pattern including the date and time parsing rules, if a value is found then return it.
- if time.dt-x[datetime] or ins.dt-x[datetime] or del.dt-x[datetime], then return the datetime attribute
- else if abbr.dt-x[title], then return the title attribute
- else if data.dt-x[value], then return the value attribute
- else return the innertext of the element.
parsing an e- property
To parse an element for a e-x property value:
- return the innerHTML of the element by using the HTML spec: Serializing HTML Fragments algorithm.
parsing for implied properties
To imply properties: (where h-x is the root microformat element being parsed)
- if no explicit "name" property,
- then imply by:
- if img.h-x then use its alt attribute for name
- else if abbr.h-x[title] then use its title attribute for name
- else if .h-x>img:only-node then use that img alt for name
- else if .h-x>abbr:only-node[title] then use that abbr title for name
- else if .h-x>:only-node>img:only-node use that img alt for name
- else if .h-x>:only-node>abbr:only-node[title] use that abbr title for name
- else use the innertext of the .h-x for name
- drop leading & trailing white-space from name, including nbsp
- if no explicit "photo" property,
- then imply by:
- if img.h-x[src] then use src for photo
- else if object.h-x[data] then use data for photo
- else if .h-x>img[src]:only-of-type then use that img src for photo
- else if .h-x>object[data]:only-of-type then use that object data for photo
- else if .h-x>:only-child>img[src]:only-of-type then use that img src for photo
- else if .h-x>:only-child>object[data]:only-of-type then use that object data for photo
- if no explicit "url" property,
- then imply by:
- if a.h-x[href] then use href for url
- else if .h-x>a[href]:only-of-type then use that a[href] for url
what do the CSS selector expressions mean
Use SelectORacle to expand any of the above CSS selector expressions into longform English prose.
questions
See the FAQ:
see also
- microformats2
- microformats2-parsing-faq
- microformats2-parsing-brainstorming - for background, thinking, exploring possibilities
- microformats2-parsing-rdf
- microformats2-implied-properties