microformats2 parsing specification

(Difference between revisions)

Jump to: navigation, search
(rel parse examples: put alternate keys in rels to match rel-urls)
(parse a hyperlink element for rel microformats: simplify so alternate is consistent)
Line 119: Line 119:
* set url to the value of the "href" attribute of the element, normalized to be an absolute URL following the containing document's language's rules for resolving relative URLs (e.g. in HTML, use the current URL context as determined by the page, and first <code>&lt;base&gt;</code> element if any).
* set url to the value of the "href" attribute of the element, normalized to be an absolute URL following the containing document's language's rules for resolving relative URLs (e.g. in HTML, use the current URL context as determined by the page, and first <code>&lt;base&gt;</code> element if any).
* treat the "rel" attribute of the element as a space separate set of rel values
* treat the "rel" attribute of the element as a space separate set of rel values
-
* if the set of rel values does NOT have "alternate" then {{warn|this if conditional may drop, due to dropping alternates}}
+
* for each rel value (rel-value)
-
** for each rel value (rel-value)
+
** if there is no key rel-value in the rels hash then create it with an empty array as its value
-
*** if there is no key rel-value in the rels hash then create it with an empty array as its value
+
** add url to the array of the key rel-value in the rels hash
-
*** add url to the array of the key rel-value in the rels hash
+
* end for
-
** end for
+
* add a key with name url in the top-level "rel-urls" hash, with an empty hash value
-
* else {{warn|this entire else clause may be dropped due to dropping alternates}}
+
* add keys to that hash for each of these attributes when present:
 +
** "hreflang": the value of the "hreflang" attribute
 +
** "media": the value of the "media" attribute
 +
** "title": the value of the "title" attribute
 +
** "type": the value of the "type" attribute
 +
** "text": the text content of the element if any
 +
* add a "rels" key to that hash with value of an array of all items in the set of rel values
 +
* if the set of rel values has "alternate" then {{warn|this if conditional may drop, due to dropping alternates}}
** if there's no top-level "alternates" array, then create it as an empty array.
** if there's no top-level "alternates" array, then create it as an empty array.
** add a new hash to the top-level "alternates" array with keys for each of these attributes when present:
** add a new hash to the top-level "alternates" array with keys for each of these attributes when present:
Line 135: Line 142:
*** "text": the text content of the element if any
*** "text": the text content of the element if any
* end if
* end if
-
* add a key with name url in the top-level "rel-urls" hash, with an empty hash value
 
-
* add keys to that hash for each of these attributes when present:
 
-
** "hreflang": the value of the "hreflang" attribute
 
-
** "media": the value of the "media" attribute
 
-
** "title": the value of the "title" attribute
 
-
** "type": the value of the "type" attribute
 
-
** "text": the text content of the element if any
 
-
* add a "rels" key to that hash with value of an array of all items in the set of rel values
 
==== rel parse examples ====
==== rel parse examples ====

Revision as of 22:55, 1 June 2015


Tantek Çelik (Editor)


microformats2 is a simple, open format for marking up data in HTML. The microformats2 parsing specification describes how to implement a microformats2 parser.

One of the goals of microformats2 is to greatly simplify parsing of microformats, in particular, by making parsing independent of any one vocabulary. This specification documents the microformats2 parsing algorithm for doing so.

Per CC0, to the extent possible under law, the editors have waived all copyright and related or neighboring rights to this work. In addition, as of 2017-12-13, the editors have made this specification available under the Open Web Foundation Agreement Version 1.0.

Contents

algorithm

parse a document for microformats

To parse a document for microformats, follow the HTML parsing rules and do the following:

{
 "items": [],
 "rels": {},
 "rel-urls": {}
}

Parsers may simultaneously parse the document for both class and rel microformats (e.g. in a single tree traversal).

parse an element for class microformats

To parse an element for class microformats:

parse an element for properties

parsing a p- property

To parse an element for a p-x property value:

parsing a u- property

To parse an element for a u-x property value:

parsing a dt- property

To parse an element for a dt-x property value:

parsing an e- property

To parse an element for a e-x property value:

parsing for implied properties

To imply properties: (where h-x is the root microformat element being parsed)

Note: The same markup for a property should not be causing that property to occur in both a microformat and one embedded inside - such a property should only be showing up on one of them. The parsing algorithm has details to prevent that, such as the :not[.h-*] tests above.

parse a hyperlink element for rel microformats

To parse a hyperlink element for rel microformats: (where * is the hyperlink element), use the following algorithm or an algorithm that produces equivalent results:

rel parse examples

Here are some examples to show how parsed rels may be reflected into the JSON (empty items key).

E.g. parsing this markup:

<a rel="author" href="http://example.com/a">author a</a>
<a rel="author" href="http://example.com/b">author b</a>
<a rel="in-reply-to" href="http://example.com/1">post 1</a>
<a rel="in-reply-to" href="http://example.com/2">post 2</a>
<a rel="alternate home"
   href="http://example.com/fr"
   media="handheld"
   hreflang="fr">French mobile homepage</a>

Would generate this JSON:

{
  "items": [],
  "rels": { 
    "author": [ "http://example.com/a", "http://example.com/b" ],
    "in-reply-to": [ "http://example.com/1", "http://example.com/2" ], 
    "alternate": [ "http://example.com/fr" ], 
    "home": [ "http://example.com/fr" ] 
  },
  "rel-urls": {
    "http://example.com/a": {
      "rels": ["author"], 
      "text": "author a"
    },
    "http://example.com/b": {
      "rels": ["author"], 
      "text": "author b"
    },
    "http://example.com/1": {
      "rels": ["in-reply-to"], 
      "text": "post 1"
    },
    "http://example.com/2": {
      "rels": ["in-reply-to"], 
      "text": "post 2"
    },
    "http://example.com/fr": {
      "rels": ["alternate", "home"],
      "media": "handheld", 
      "hreflang": "fr", 
      "text": "French mobile homepage"
    }
  }
  "alternates": [{
     "url": "http://example.com/fr", 
     "rel": "home", 
     "media": "handheld", 
     "hreflang": "fr",
     "text": "French mobile homepage"
  }]
}
Warning: The "alternates" collection is likely to be dropped. Use "rel-urls" instead.

Another parse output example can be found here:

what do the CSS selector expressions mean

This section is non-normative.

Use SelectORacle to expand any of the above CSS selector expressions into longform English prose.

Exception:

note HTML parsing rules

This section is non-normative.

microformats2 parsers are expected to follow HTML parsing rules, which includes for example:

questions

See the FAQ:

issues

See the issues page:

implementations

Main article: microformats2#Implementations

There are open source microformats2 parsers available for Javascript, node.js, PHP, Ruby and Python.

test suite

See:

Ports to/for other languages encouraged.

see also

Categories

microformats2 parsing specification was last modified: Wednesday, December 31st, 1969

Views