microformats2-json: Difference between revisions

From Microformats Wiki
Jump to navigation Jump to search
(Stubs aren’t so much stubs when they need only a single specific expension, update rels with a multi-rel URL, start the rel-urls documentation)
m (JSON Schema has moved to its own repository with automated testing)
 
(6 intermediate revisions by the same user not shown)
Line 1: Line 1:
<dfn style="font-style:normal;font-weight:bold">microformats2 JSON</dfn> is the canonical output format of the [[microformats2-parsing|microformats2 parsing]] algorithm. As such it can be used to compare parsers and create [[test-suite|test suites]]. It is also used as the official serialisation format of microformats objects, and relied upon by specifications such as [https://micropub.net/ Micropub].
This page contains an informative description of <dfn style="font-style:normal;font-weight:bold">microformats2 JSON</dfn>, the canonical output format of the [[microformats2-parsing|microformats2 parsing]] algorithm. Note that the [[microformats2-parsing|microformats2 parsing specification]] is the only authoritative source for its own output.
 
'''Goal:''' Document possible values – with examples – of the official serialisation format of microformats generated by [[microformats2#Parsers|microformats2 parsers]]. This format can be used to compare parsers and create [[test-suite|test suites]]. It also clarifies the format for other specifications that rely on the serialisation, such as [https://micropub.net/ Micropub].
 
'''Audience:''' Parser authors, parser users, web developers. This document is written for anyone working with microformats2 in their serialised form, so they can read about the format generated by the parsing algorithm without having to understand the actual parsing itself.
 
'''Author(s):''' [[User:Zegnat|Martijn van der Ven]]


<div style="margin:1em;padding:1em;background:#FFDC00;font-size:smaller">⚠️ The JSON format used is not pinned to a specific [https://indieweb.org/JSON#Specs JSON specification]. See [https://github.com/microformats/microformats2-parsing/issues/23 issue #23] for a discussion on the subject.</div>
<div style="margin:1em;padding:1em;background:#FFDC00;font-size:smaller">⚠️ The JSON format used is not pinned to a specific [https://indieweb.org/JSON#Specs JSON specification]. See [https://github.com/microformats/microformats2-parsing/issues/23 issue #23] for a discussion on the subject.</div>
Line 7: Line 13:
Parsers collect not only microformats2 objects, but also [[rel|link relationships]]. Parsing an entire document will result in an outer object with 3 members named <code>items</code>, <code>rels</code>, and <code>rel-urls</code>:
Parsers collect not only microformats2 objects, but also [[rel|link relationships]]. Parsing an entire document will result in an outer object with 3 members named <code>items</code>, <code>rels</code>, and <code>rel-urls</code>:


<pre>{
<source lang=javascript>{
   "items": [],
   "items": [],
   "rels": {},
   "rels": {},
   "rel-urls": {}
   "rel-urls": {}
}</pre>
}</source>


# <code>items</code> is an array of [[microformats2-json#microformat2_Objects|microformats2 objects]], ordered according to their order in the source document.
# <code>items</code> is an array of [[microformats2-json#microformat2_Objects|microformats2 objects]], ordered according to their order in the source document.
Line 18: Line 24:


== microformat2 Objects ==
== microformat2 Objects ==
<div style="margin:1em;padding:1em;background:#F8F7EC;font-size:smaller">🕰️ '''This section is outdated.''' An extra optional member called <code>id</code> was [http://microformats.org/wiki/index.php?title=microformats2-parsing&diff=66967&oldid=66966 added in December].</div>


The '''microformats2 object''' is an object with 2 required members named <code>type</code> and <code>properties</code>, as well as an optional member named <code>children</code>:
The '''microformats2 object''' is an object with 2 required members named <code>type</code> and <code>properties</code>, as well as an optional member named <code>children</code>:


<pre>{  
<source lang=javascript>{  
   "type": [],
   "type": [],
   "properties": {},
   "properties": {},
   "children": []
   "children": []
}</pre>
}</source>


# <code>type</code> is an array of the types that identify the microformat, ordered alphabetically.
# <code>type</code> is an array of the types that identify the microformat, ordered alphabetically.
Line 41: Line 49:
The following example shows an <code>h-entry</code> type microformats2 object, with a single property attached. The <code>h-entry</code> type is [[h-entry|documented on the wiki]], this way types point towards documented conventions that hold true no matter what the source document was.
The following example shows an <code>h-entry</code> type microformats2 object, with a single property attached. The <code>h-entry</code> type is [[h-entry|documented on the wiki]], this way types point towards documented conventions that hold true no matter what the source document was.


<pre>{  
<source lang=javascript>{  
   "type": ["h-entry"],
   "type": ["h-entry"],
   "properties": {
   "properties": {
     "summary": ["A short published note."]
     "summary": ["A short published note."]
   }
   }
}</pre>
}</source>


=== properties ===
=== properties ===
<div style="margin:1em;padding:1em;background:#F8F7EC;font-size:smaller">🕰️ '''This section is outdated.''' A new [[microformats2-parsing#parse_an_img_element_for_src_and_alt|valid value for images]] was [http://microformats.org/wiki/index.php?title=microformats2-parsing&diff=66969&oldid=66967 added in January].</div>


The <code>properties</code> member contains an object where every member name is a microformats2 property name, and every member value is an array of the found microformats2 values. Even when only one value is given, it will be inside an array.
The <code>properties</code> member contains an object where every member name is a microformats2 property name, and every member value is an array of the found microformats2 values. Even when only one value is given, it will be inside an array.
Line 66: Line 76:
To see what these properties mean in the context of an <code>h-entry</code> type, see [[h-entry#Core_Properties|the Core Properties section on the type’s wiki page]].
To see what these properties mean in the context of an <code>h-entry</code> type, see [[h-entry#Core_Properties|the Core Properties section on the type’s wiki page]].


<pre>{
<source lang=javascript>{
   "type": ["h-entry"],
   "type": ["h-entry"],
   "properties": {
   "properties": {
Line 99: Line 109:
     ]
     ]
   }
   }
}</pre>
}</source>


=== children ===
=== children ===
Line 109: Line 119:
The following example shows an <code>h-feed</code> type microformats2 object with a few properties that describe the feed, and an array of <code>h-entry</code> objects as its children.
The following example shows an <code>h-feed</code> type microformats2 object with a few properties that describe the feed, and an array of <code>h-entry</code> objects as its children.


<pre>{
<source lang=javascript>{
   "type": ["h-feed"],
   "type": ["h-feed"],
   "properties": {
   "properties": {
Line 153: Line 163:
     }
     }
   ]
   ]
}</pre>
}</source>


== rels Object ==
== rels Object ==
Line 167: Line 177:
* <code><nowiki>https://example.com/a</nowiki></code> and <code><nowiki>https://example.com/b</nowiki></code> both identify an author of the current page using [[rel-author|the <code>author</code> relationship value]].
* <code><nowiki>https://example.com/a</nowiki></code> and <code><nowiki>https://example.com/b</nowiki></code> both identify an author of the current page using [[rel-author|the <code>author</code> relationship value]].


<pre>{
<source lang=javascript>{
   "home": [
   "home": [
     "https://example.com/",
     "https://example.com/",
Line 177: Line 187:
   ],
   ],
   "alternate": ["https://example.com/fr/"]
   "alternate": ["https://example.com/fr/"]
}</pre>
}</source>


== rel-urls Object ==
== rel-urls Object ==


<div style="margin:1em;padding:1em;background:#7FDBFF;font-size:smaller">ℹ️ '''This section is a stub.''' You can help the microformats.org wiki by expanding it.</div>
The '''rel-urls object''' is an object with any amount of members, where every member name is a URL and every member value is an object.


The '''rel-urls object''' is an object with any amount of members, where every member name is a URL and every member value is an object.
The value object will always contain a member with the name <code>rels</code>. That member’s value will be an array of alphabeticall sorted link relationships applicable to the URL.
 
The value object may additionally contain any members with the following names:
 
* <code>hreflang</code>
* <code>media</code>
* <code>title</code>
* <code>type</code>
 
The values of these members are always a single string. The value is taken from the link’s attribute in the source document matching the member name. E.g. in HTML the value of an <code><a></code> element’s <code>hreflang</code> attribute will be the value of the URL’s <code>hreflang</code> member in the rel-urls object.
 
In addition the optional member named <code>text</code> is added if any text is associated with the URL in the source document. E.g. in HTML the text within an <code><a></code> element will be used.
 
The following example shows what the companion rel-urls object to the above rels object might look like. It shows the text that was used to link to the authors, giving us more context of who they are. It also makes clear that the alternative version of the home page is in fact in French and meant for handheld devices.


<pre>{
<source lang=javascript>{
   "https://example.com/": {
   "https://example.com/": {
     "rels": ["home"],
     "rels": ["home"],
Line 201: Line 224:
     "rels": ["alternate", "home"],
     "rels": ["alternate", "home"],
     "hreflang": "fr",
     "hreflang": "fr",
    "media": "handheld",
     "text": "Example page d’accueil"
     "text": "Example page d’accueil"
   }
   }
}</pre>
}</source>
 
The rel-urls object was a later addition to the specification. It exists because so much information was lost when only the rels object was created. The discussion that shaped it [[microformats2-parsing-brainstorming#more_information_for_rel-based_formats|can be read on the brainstorming page]].


== See Also ==
== See Also ==


* [https://tools.ietf.org/html/rfc8259 RFC 8259: The JavaScript Object Notation (JSON) Data Interchange Format]
* [https://tools.ietf.org/html/rfc8259 RFC 8259: The JavaScript Object Notation (JSON) Data Interchange Format]
* [https://gist.github.com/Zegnat/65ed9a9fb0546fb8c4aa0c0b790b8a40 JSON Schema for microformats2 objects], by [https://vanderven.se/martijn/ Martijn van der Ven]
* [https://github.com/Zegnat/microformats2-json-schema JSON Schema for microformats2 objects], by [https://vanderven.se/martijn/ Martijn van der Ven]
* [https://github.com/cleverdevil/microformats2 Type- and vocabulary-aware microformats2 JSON validator in Python], by [https://cleverdevil.io/ Jonathan LaCour]
* [https://github.com/cleverdevil/microformats2 Type- and vocabulary-aware microformats2 JSON validator in Python], by [https://cleverdevil.io/ Jonathan LaCour]
* [[microformats2-parsing-brainstorming#more_information_for_rel-based_formats|Background behind the included rels and rel-urls objects]]

Latest revision as of 22:38, 25 December 2021

This page contains an informative description of microformats2 JSON, the canonical output format of the microformats2 parsing algorithm. Note that the microformats2 parsing specification is the only authoritative source for its own output.

Goal: Document possible values – with examples – of the official serialisation format of microformats generated by microformats2 parsers. This format can be used to compare parsers and create test suites. It also clarifies the format for other specifications that rely on the serialisation, such as Micropub.

Audience: Parser authors, parser users, web developers. This document is written for anyone working with microformats2 in their serialised form, so they can read about the format generated by the parsing algorithm without having to understand the actual parsing itself.

Author(s): Martijn van der Ven

⚠️ The JSON format used is not pinned to a specific JSON specification. See issue #23 for a discussion on the subject.

Parsed Document Format

Parsers collect not only microformats2 objects, but also link relationships. Parsing an entire document will result in an outer object with 3 members named items, rels, and rel-urls:

{
  "items": [],
  "rels": {},
  "rel-urls": {}
}
  1. items is an array of microformats2 objects, ordered according to their order in the source document.
  2. rels is an object where the member names reflect all rel-values found in the source document.
  3. rel-urls is an object where the member names reflect all URLs found in the source document with rel-values attached.

microformat2 Objects

🕰️ This section is outdated. An extra optional member called id was added in December.

The microformats2 object is an object with 2 required members named type and properties, as well as an optional member named children:

{ 
  "type": [],
  "properties": {},
  "children": []
}
  1. type is an array of the types that identify the microformat, ordered alphabetically.
  2. properties is an object where the member names reflect all properties found for the microformat.
  3. The optional member children is an array of other microformats2 objects that were found nested in the current one.

type

ℹ️ This section needs expanding. It needs an example of a microformats2 object that uses multiple types and still makes sense.

The type member contains an alphabetically sorted array of root class names. These names express what the microformat is expressing, and are often coupled to which properties to expect through documented conventions.

The root class names are individual strings that match the pattern h-([0-9a-z]+-)?[a-z]+.

The following example shows an h-entry type microformats2 object, with a single property attached. The h-entry type is documented on the wiki, this way types point towards documented conventions that hold true no matter what the source document was.

{ 
  "type": ["h-entry"],
  "properties": {
    "summary": ["A short published note."]
  }
}

properties

🕰️ This section is outdated. A new valid value for images was added in January.

The properties member contains an object where every member name is a microformats2 property name, and every member value is an array of the found microformats2 values. Even when only one value is given, it will be inside an array.

Valid values in the value array are one of the following:

  1. a string value, the most common value,
  2. an embedded markup object, containing both a plain string value and the verbatim mark-up from the source document, or
  3. another microformat2 object.

If a microformat2 object is used as the value of a property, it will gain the additional member value to express a plain string representation. If a consuming application does not understand the nested microformat2 object, it can opt to treat it as that string.

If a microformat2 object is used as the value of a property, when the parser is also instructed to return it as an embedded markup object, it will gain the additional member html.

The following example shows an h-entry type microformats2 object, with 3 properties to show the 3 different types of properties. The name is a single string, the content contains verbatim HTML from the source document, and the author is a nested microformat2 h-card object. The in-reply-to property has been added to show how one property may contain multiple valid values.

To see what these properties mean in the context of an h-entry type, see the Core Properties section on the type’s wiki page.

{
  "type": ["h-entry"],
  "properties": {
    "name": ["An example entry"],
    "content": [
      {
        "html": "<p>Ut non sit saepe porro porro est aut. Dicta ut repellat quisquam repellendus et iste consequatur.</p>\n<p>Consequuntur repellat sed aut in et dolores. Consequatur amet quo enim.</p>",
        "value": "Ut non sit saepe porro porro est aut. Dicta ut repellat quisquam repellendus et iste consequatur.\nConsequuntur repellat sed aut in et dolores. Consequatur amet quo enim."
      }
    ],
    "author": [
      {
        "type": ["h-card"],
        "properties": {
          "name": ["Mx Example"],
          "url": ["https://example.com/"]
        },
        "value": "Mx Example"
      }
    ],
    "in-reply-to": [
      {
        "type": ["h-cite"],
        "properties": {
          "name": ["Example Domain"],
          "author": ["IANA"],
          "url": ["https://example.org/"]
        },
        "value": "https://example.org/"
      },
      "https://example.net/"
    ]
  }
}

children

The optional children member is added when nested microformats are found and contains an array of microformat2 objects.

This happens when other objects are contained with outer ones, e.g. data is marked up with microformats within the content of an h-entry. Another possibility is that the outer object exists to group all its nested objects, such as an h-feed.

The following example shows an h-feed type microformats2 object with a few properties that describe the feed, and an array of h-entry objects as its children.

{
  "type": ["h-feed"],
  "properties": {
    "author": ["https://example.org/"],
    "name": ["Example Feed"]
  },
  "children": [
    {
      "type": ["h-entry"],
      "properties": {
        "name": ["Entry 1"],
        "content": [
          {
            "html": "<p>Ut non sit saepe porro porro est aut.</p>\n<p>Dicta ut repellat quisquam repellendus et iste consequatur.</p>",
            "value": "Ut non sit saepe porro porro est aut.\nDicta ut repellat quisquam repellendus et iste consequatur."
          }
        ]
      }
    },
    {
      "type": ["h-entry"],
      "properties": {
        "name": ["Entry 2"],
        "content": [
          {
            "html": "<p>Ut non sit saepe porro porro est aut.</p>\n<p>Dicta ut repellat quisquam repellendus et iste consequatur.</p>",
            "value": "Ut non sit saepe porro porro est aut.\nDicta ut repellat quisquam repellendus et iste consequatur."
          }
        ]
      }
    },
    {
      "type": ["h-entry"],
      "properties": {
        "name": ["Entry 3"],
        "content": [
          {
            "html": "<p>Ut non sit saepe porro porro est aut.</p>\n<p>Dicta ut repellat quisquam repellendus et iste consequatur.</p>",
            "value": "Ut non sit saepe porro porro est aut.\nDicta ut repellat quisquam repellendus et iste consequatur."
          }
        ]
      }
    }
  ]
}

rels Object

The rels object is an object with any amount of members, where every member name is a link relationship (see the documented existing relationships for examples) and every member value is an array of URLs.

Any relationship can have 1 or more URLs in its matching array. And any URL can be in several arrays, if it has several relationships associated with it.

The following example shows a rels object where 4 URLs were found in the source document:

{
  "home": [
    "https://example.com/",
    "https://example.com/fr/"
  ],
  "author": [
    "https://example.com/a",
    "https://example.com/b"
  ],
  "alternate": ["https://example.com/fr/"]
}

rel-urls Object

The rel-urls object is an object with any amount of members, where every member name is a URL and every member value is an object.

The value object will always contain a member with the name rels. That member’s value will be an array of alphabeticall sorted link relationships applicable to the URL.

The value object may additionally contain any members with the following names:

  • hreflang
  • media
  • title
  • type

The values of these members are always a single string. The value is taken from the link’s attribute in the source document matching the member name. E.g. in HTML the value of an <a> element’s hreflang attribute will be the value of the URL’s hreflang member in the rel-urls object.

In addition the optional member named text is added if any text is associated with the URL in the source document. E.g. in HTML the text within an <a> element will be used.

The following example shows what the companion rel-urls object to the above rels object might look like. It shows the text that was used to link to the authors, giving us more context of who they are. It also makes clear that the alternative version of the home page is in fact in French and meant for handheld devices.

{
  "https://example.com/": {
    "rels": ["home"],
    "text": "Example Homepage"
  },
  "https://example.com/a": {
    "rels": ["author"],
    "text": "Mx Adam"
  },
  "https://example.com/b": {
    "rels": ["author"],
    "text": "Mx Baker"
  },
  "https://example.com/fr/": {
    "rels": ["alternate", "home"],
    "hreflang": "fr",
    "media": "handheld",
    "text": "Example page d’accueil"
  }
}

The rel-urls object was a later addition to the specification. It exists because so much information was lost when only the rels object was created. The discussion that shaped it can be read on the brainstorming page.

See Also