microformats2 parsing

(Difference between revisions)

Jump to: navigation, search
(removed "* else if br.p-x or hr.p-x, then return "" (empty string)" parsing step per issue raised by Glenn Jones)
(parse an element for microformats: incorporate when to parse properties and child elements with an explicit order)
(2 intermediate revisions not shown.)
Line 13: Line 13:
To parse an element for microformats:
To parse an element for microformats:
* parse element class for root class name(s) "h-x" (and backcompat)
* parse element class for root class name(s) "h-x" (and backcompat)
-
** if found, start parsing a new microformat
+
** if not found, parse child elements for microformats (depth first, doc order)
-
*** parse contained elements for properties (depth first, doc order)
+
** else if found, start parsing a new microformat
-
**** parse an element for microformats (recurse)
+
*** parse child elements (document order) by:
-
*** imply properties (see below)
+
**** parse a child element for properties (p-,u-,dt-,e-)
-
* parse element class for properties (p-,u-,dt-,e-)
+
***** add properties found to current microformat
-
* add properties found (with any nested microformats) to current microformat
+
**** parse a child element for microformats (recurse)
 +
***** if that child element itself has a microformat and is a property element, add it into the array of values for that property
 +
***** else add found elements that are microformats to the "children" array
 +
*** imply properties for the found microformat (see below)
 +
=== parse an element for properties ===
==== parsing a p- property ====
==== parsing a p- property ====
To parse an element for a p-x property value:
To parse an element for a p-x property value:
Line 51: Line 55:
* return the innerHTML of the element by using the [http://www.whatwg.org/specs/web-apps/current-work/multipage/the-end.html#serializing-html-fragments HTML spec: Serializing HTML Fragments algorithm].
* return the innerHTML of the element by using the [http://www.whatwg.org/specs/web-apps/current-work/multipage/the-end.html#serializing-html-fragments HTML spec: Serializing HTML Fragments algorithm].
-
=== parsing for implied properties ===
+
==== parsing for implied properties ====
To imply properties: (where h-x is the root microformat element being parsed)
To imply properties: (where h-x is the root microformat element being parsed)
* if no explicit "name" property,  
* if no explicit "name" property,  
Line 57: Line 61:
** if img.h-x then use its alt attribute for name
** if img.h-x then use its alt attribute for name
** else if abbr.h-x[title] then use its title attribute for name
** else if abbr.h-x[title] then use its title attribute for name
-
** else if .h-x>img:only-node then use that img alt for name
+
** else if .h-x>img:only-child then use that img alt for name
-
** else if .h-x>abbr:only-node[title] then use that abbr title for name
+
** else if .h-x>abbr:only-child[title] then use that abbr title for name
-
** else if .h-x>:only-node>img:only-node use that img alt for name
+
** else if .h-x>:only-child>img:only-child use that img alt for name
-
** else if .h-x>:only-node>abbr:only-node[title] use that abbr title for name
+
** else if .h-x>:only-child>abbr:only-child[title] use that abbr title for name
** else use the innertext of the .h-x for name
** else use the innertext of the .h-x for name
** drop leading & trailing white-space from name, including nbsp
** drop leading & trailing white-space from name, including nbsp

Revision as of 04:42, 2 March 2013


One of the goals of microformats2 is to greatly simplify parsing of microformats, in particular, by making parsing independent of any one vocabulary. This page briefly documents the microformats2 parsing algorithm for doing so.

Contents

parsing algorithm

parse a document for microformats

To parse a document for microformats:

parse an element for microformats

To parse an element for microformats:

parse an element for properties

parsing a p- property

To parse an element for a p-x property value:

parsing a u- property

To parse an element for a u-x property value:

parsing a dt- property

To parse an element for a dt-x property value:

parsing an e- property

To parse an element for a e-x property value:

parsing for implied properties

To imply properties: (where h-x is the root microformat element being parsed)

what do the CSS selector expressions mean

Use SelectORacle to expand any of the above CSS selector expressions into longform English prose.

questions

See the FAQ:

issues

  • The parsing rule 'else if br.p-x or hr.p-x, then return "" (empty string)' for p-* can cause any code consuming the API to become quite bloated. It means that you have test every array value to see if its an empty string. It is also unclear to me what the purpose of this mark-up pattern is for Glenn Jones
    • Upon reconsidering this, I agree with you, this is an unlikely use case. If a publisher wants to explicitly set an empty property "p-foo" they can simply write <span class="p-foo"></span> which looks explicit. Whereas BR and HR tags are often just presentational, so we should both not encourage usage of them for semantics, and anyone that did use them would be subject to likely loss of semantics upon a redesign (that got rid of those particular BR and HR tags). I'm going to remove them from the parsing spec. - Tantek 15:29, 10 February 2013 (UTC)

see also

microformats2 parsing was last modified: Wednesday, December 31st, 1969

Views