microformats2 parsing specification

(Difference between revisions)

Jump to: navigation, search
m (parse an element for class microformats: + </code>)
(resolved issues: uf2 children in backcompat root, any h-* overrides backcompat root, backcompat/*-* props only for backcompat/h-* respectively, only h-* get implied p, parse link elem, drop alternates)
Line 27: Line 27:
=== parse an element for class microformats ===
=== parse an element for class microformats ===
To parse an element for class microformats:
To parse an element for class microformats:
-
* parse element class for root class name(s) "h-*" and backcompat root classes
+
* parse element class for root class name(s) "h-*" and if none, backcompat root classes
-
** if not found, parse child elements for microformats (depth first, doc order)
+
** if none found, parse child elements for microformats (depth first, doc order)
** else if found, start parsing a new microformat
** else if found, start parsing a new microformat
 +
*** keep track of whether the root class name(s) was from backcompat
*** create a new { } structure with:
*** create a new { } structure with:
**** <code>type: <nowiki>[array of microformat "h-*" type(s) on the element]</nowiki>,</code>
**** <code>type: <nowiki>[array of microformat "h-*" type(s) on the element]</nowiki>,</code>
**** <code>properties: { } </code> - to be filled in when that element itself is parsed for microformats properties
**** <code>properties: { } </code> - to be filled in when that element itself is parsed for microformats properties
-
**** if that element is an <code>&lt;area&gt;</code> element, also add:
 
-
***** <code>shape:</code> from <code>area[shape]</code> if any
 
-
***** <code>coords:</code> from <code>area[coords]</code> if any
 
-
**** end if
 
*** parse child elements (document order) by:
*** parse child elements (document order) by:
-
**** parse a child element class for property class name(s) "p-*,u-*,dt-*,e-*" respectively as detailed below
+
**** if parsing a backcompat root, parse child element class name(s) for backcompat properties
 +
**** else parse a child element class for property class name(s) "p-*,u-*,dt-*,e-*"
**** if such class(es) are found, it is a property element
**** if such class(es) are found, it is a property element
***** add properties found to current microformat's <code>properties: { } </code> structure
***** add properties found to current microformat's <code>properties: { } </code> structure
Line 53: Line 51:
=== parse an element for properties ===
=== parse an element for properties ===
==== parsing a p- property ====
==== parsing a p- property ====
-
To parse an element for a p-x property value:
+
To parse an element for a p-x property value whether explicit "p-*" or backcompat equivalent:
* parse the element for the [[value-class-pattern]], if a value is found then return it.
* parse the element for the [[value-class-pattern]], if a value is found then return it.
* if abbr.p-x[title], then return the title attribute
* if abbr.p-x[title], then return the title attribute
Line 61: Line 59:
==== parsing a u- property ====
==== parsing a u- property ====
-
To parse an element for a u-x property value:
+
To parse an element for a u-x property value whether explicit "u-*" or backcompat equivalent:
* if a.u-x[href] or area.u-x[href], then get the href attribute
* if a.u-x[href] or area.u-x[href], then get the href attribute
* else if img.u-x[src] or audio.u-x[src] or video.u-x[src] or source.u-x[src], then get the src attribute
* else if img.u-x[src] or audio.u-x[src] or video.u-x[src] or source.u-x[src], then get the src attribute
Line 72: Line 70:
==== parsing a dt- property ====
==== parsing a dt- property ====
-
To parse an element for a dt-x property value:
+
To parse an element for a dt-x property value whether explicit "dt-*" or backcompat equivalent:
* parse the element for the [[value-class-pattern]] including the date and time parsing rules, if a value is found then return it.
* parse the element for the [[value-class-pattern]] including the date and time parsing rules, if a value is found then return it.
* if time.dt-x[datetime] or ins.dt-x[datetime] or del.dt-x[datetime], then return the datetime attribute
* if time.dt-x[datetime] or ins.dt-x[datetime] or del.dt-x[datetime], then return the datetime attribute
Line 80: Line 78:
==== parsing an e- property ====
==== parsing an e- property ====
-
To parse an element for a e-x property value:
+
To parse an element for a e-x property value whether explicit "e-*" or backcompat equivalent:
* return a dictionary with two keys:
* return a dictionary with two keys:
** <code>html</code>: the innerHTML of the element by using the [https://html.spec.whatwg.org/multipage/syntax.html#serialising-html-fragments HTML spec: Serializing HTML Fragments algorithm], with leading/trailing whitespace removed.
** <code>html</code>: the innerHTML of the element by using the [https://html.spec.whatwg.org/multipage/syntax.html#serialising-html-fragments HTML spec: Serializing HTML Fragments algorithm], with leading/trailing whitespace removed.
Line 86: Line 84:
==== parsing for implied properties ====
==== parsing for implied properties ====
-
To imply properties: (where h-x is the root microformat element being parsed)
+
Imply properties only on explicit h-x class name root microformat element (no backcompat roots)
* if no explicit "name" property,  
* if no explicit "name" property,  
* then imply by:
* then imply by:
Line 118: Line 116:
=== parse a hyperlink element for rel microformats ===
=== parse a hyperlink element for rel microformats ===
-
To parse a hyperlink element for rel microformats: (where * is the hyperlink element), use the following algorithm or an algorithm that produces equivalent results:
+
To parse a hyperlink element (e.g. a or link) for rel microformats: use the following algorithm or an algorithm that produces equivalent results:
* if the "rel" attribute of the element is empty then exit
* if the "rel" attribute of the element is empty then exit
* set url to the value of the "href" attribute of the element, normalized to be an absolute URL following the containing document's language's rules for resolving relative URLs (e.g. in HTML, use the current URL context as determined by the page, and first <code>&lt;base&gt;</code> element if any).
* set url to the value of the "href" attribute of the element, normalized to be an absolute URL following the containing document's language's rules for resolving relative URLs (e.g. in HTML, use the current URL context as determined by the page, and first <code>&lt;base&gt;</code> element if any).
Line 135: Line 133:
* if there is no "rels" key in that hash, add it with an empty array value
* if there is no "rels" key in that hash, add it with an empty array value
* set the value of that "rels" key to an array of all unique items in the set of rel values unioned with the current array value of the "rels" key
* set the value of that "rels" key to an array of all unique items in the set of rel values unioned with the current array value of the "rels" key
-
* if the set of rel values has "alternate" {{warn|this entire if clause may be dropped due to dropping alternates}}
 
-
** if there's no top-level "alternates" array, then create it as an empty array.
 
-
** add a new hash to the top-level "alternates" array with keys for each of these attributes when present:
 
-
*** "url": url
 
-
*** "rel": the set of rel values appended with spaces, except "alternate"
 
-
*** "hreflang": the value of the "hreflang" attribute
 
-
*** "media": the value of the "media" attribute
 
-
*** "title": the value of the "title" attribute
 
-
*** "type": the value of the "type" attribute
 
-
*** "text": the text content of the element if any
 
-
* end if
 
==== rel parse examples ====
==== rel parse examples ====
Line 196: Line 183:
     }
     }
   }
   }
-
  "alternates": [{
 
-
    "url": "http://example.com/fr",
 
-
    "rel": "home",
 
-
    "media": "handheld",
 
-
    "hreflang": "fr",
 
-
    "text": "French mobile homepage"
 
-
  }]
 
}
}
</source>
</source>
-
{{warning|The "alternates" collection is likely to be dropped. Use "rel-urls" instead.}}
 

Revision as of 23:24, 18 September 2015


Tantek Çelik (Editor)


microformats2 is a simple, open format for marking up data in HTML. The microformats2 parsing specification describes how to implement a microformats2 parser.

One of the goals of microformats2 is to greatly simplify parsing of microformats, in particular, by making parsing independent of any one vocabulary. This specification documents the microformats2 parsing algorithm for doing so.

Per CC0, to the extent possible under law, the editors have waived all copyright and related or neighboring rights to this work. In addition, as of 2017-11-18, the editors have made this specification available under the Open Web Foundation Agreement Version 1.0.

Contents

algorithm

parse a document for microformats

To parse a document for microformats, follow the HTML parsing rules and do the following:

{
 "items": [],
 "rels": {},
 "rel-urls": {}
}

Parsers may simultaneously parse the document for both class and rel microformats (e.g. in a single tree traversal).

parse an element for class microformats

To parse an element for class microformats:

parse an element for properties

parsing a p- property

To parse an element for a p-x property value whether explicit "p-*" or backcompat equivalent:

parsing a u- property

To parse an element for a u-x property value whether explicit "u-*" or backcompat equivalent:

parsing a dt- property

To parse an element for a dt-x property value whether explicit "dt-*" or backcompat equivalent:

parsing an e- property

To parse an element for a e-x property value whether explicit "e-*" or backcompat equivalent:

parsing for implied properties

Imply properties only on explicit h-x class name root microformat element (no backcompat roots)

Note: The same markup for a property should not be causing that property to occur in both a microformat and one embedded inside - such a property should only be showing up on one of them. The parsing algorithm has details to prevent that, such as the :not[.h-*] tests above.

parse a hyperlink element for rel microformats

To parse a hyperlink element (e.g. a or link) for rel microformats: use the following algorithm or an algorithm that produces equivalent results:

rel parse examples

Here are some examples to show how parsed rels may be reflected into the JSON (empty items key).

E.g. parsing this markup:

<a rel="author" href="http://example.com/a">author a</a>
<a rel="author" href="http://example.com/b">author b</a>
<a rel="in-reply-to" href="http://example.com/1">post 1</a>
<a rel="in-reply-to" href="http://example.com/2">post 2</a>
<a rel="alternate home"
   href="http://example.com/fr"
   media="handheld"
   hreflang="fr">French mobile homepage</a>

Would generate this JSON:

{
  "items": [],
  "rels": { 
    "author": [ "http://example.com/a", "http://example.com/b" ],
    "in-reply-to": [ "http://example.com/1", "http://example.com/2" ],
    "alternate": [ "http://example.com/fr" ], 
    "home": [ "http://example.com/fr" ] 
  },
  "rel-urls": {
    "http://example.com/a": {
      "rels": ["author"], 
      "text": "author a"
    },
    "http://example.com/b": {
      "rels": ["author"], 
      "text": "author b"
    },
    "http://example.com/1": {
      "rels": ["in-reply-to"], 
      "text": "post 1"
    },
    "http://example.com/2": {
      "rels": ["in-reply-to"], 
      "text": "post 2"
    },
    "http://example.com/fr": {
      "rels": ["alternate", "home"],
      "media": "handheld", 
      "hreflang": "fr", 
      "text": "French mobile homepage"
    }
  }
}


what do the CSS selector expressions mean

This section is non-normative.

Use SelectORacle to expand any of the above CSS selector expressions into longform English prose.

Exception:

note HTML parsing rules

This section is non-normative.

microformats2 parsers are expected to follow HTML parsing rules, which includes for example:

questions

See the FAQ:

issues

See the issues page:

implementations

Main article: microformats2#Implementations

There are open source microformats2 parsers available for Javascript, node.js, PHP, Ruby and Python.

test suite

See:

Ports to/for other languages encouraged.

see also

Categories

microformats2 parsing specification was last modified: Wednesday, December 31st, 1969

Views