measure: Difference between revisions

From Microformats Wiki
Jump to navigation Jump to search
(→‎Parsing Hints: Parsing numbers in Perl)
(make this a top-level microformats research page, move brainstorming to *-brainstorming, see history for contributors, add next steps)
(14 intermediate revisions by 6 users not shown)
Line 1: Line 1:
<h1>Measure microformat</h1>
<entry-title>Measure microformat research</entry-title>


<div style="float:right;margin-left:1em">__TOC__</div>
<div style="float:right;margin-left:1em">__TOC__</div>
This page is for researching and developing a [[measure]] microformat. Per the [[process]]:
* [[measure-examples]]
* [[measure-formats]]
* [[measure-brainstorming]]


== The problem ==
== The problem ==
Line 11: Line 16:
The Measurement microformat will enable unambiguous description of physical quantities and thus provide a solid ground for data sharing and automation in many areas.
The Measurement microformat will enable unambiguous description of physical quantities and thus provide a solid ground for data sharing and automation in many areas.


== Draft Schema==
== Next Steps ==
 
* clean-up [[measure-examples]] to refer to <em>current</em> real world examples
Rationale: The name "type" is taken from [[hCard]]; "item" is used from hReview.
* update [[measure-formats]] with formats from other recent efforts such as schema.org
 
* massive clean-up of [[measure-brainstorming]]
=== Standard Measure Schema ===
 
* '''<code>hmeasure</code>'''
** '''<code>num</code>''' {1} (numeric)
** '''<code>unit</code>''' {1} (unit)
** <code>item</code>?  (text | [[hcard|hCard]] | [[hcalendar|hCalendar]])
** <code>type</code> ? (text, e.g. "height", "width", "weight")
** <code>tolerance</code> ? (percentage | hmeasure)
 
=== Angular Measure Schema ===
 
* '''<code>hangle</code>'''
** '''<code>num</code>''' {1} (degree)
** <code>item</code>?  (text | [[hcard|hCard]] | [[hcalendar|hCalendar]])
** <code>type</code> ? (text, e.g. "angle of elevation")
** <code>tolerance</code> ? (percentage | hangle)
 
=== Money Schema ===
 
* '''<code>hmoney</code>'''
** '''<code>num</code>''' {1} (numeric)
** '''<code>unit</code>''' {1} ([http://en.wikipedia.org/wiki/ISO_4217 ISO 4217 code])
** <code>tolerance</code> ? (percentage | hmoney)
 
== <tt>num</tt>: The Value ==
 
Arbitrary white space {{may}} be included in the value to improve readability. Parsers {{must}} strip out all white space before further processing.
 
In the standard and money schemas, the value {{must}} be a number, formatted according to the following EBNF pattern:
 
<pre>non-zero-digit = "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
digit          = "0" | non-zero-digit ;
natural        = non-zero-digit , {digit} ;
integer        = "0" | [ "-" ] , natural ;
dot-decimal    = integer , "." , {digit} ;
comma-decimal  = integer , "," , {digit} ;
e-sign        = "e" | "E" ;
mantissa      = dot-decimal | comma-decimal | integer ;
sci-number    = mantissa , e-sign , integer ;
number        = dot-decimal | comma-decimal | integer | sci-number ;</pre>
 
This roughly corresponds to a subset of [http://en.wikipedia.org/wiki/C_syntax#Floating_point_types C syntax] for floating points and integers, excluding octal and hexadecimal representations. However, note that both commas and stops may be used as decimal points.
 
The Unicode minus sign (U+2212) and ASCII-compatible hyphen-minus (U+002D) {{must}} both be treated as acceptable indicators of a negative number. In addition, the symbols &frac14; (U+00BC), &frac12; (U+00BD) and &frac34; (U+00BE) {{should}} be supported as aliases for 0.25, 0.5 and 0.75 respectively.
 
In the angular measure schema, a measure is expressed as a combination of up to three numeric components: called degrees, minutes and seconds. Any combination of these components may be used, except when degrees and seconds are given minutes {{must}} be present. The components {{must}} appear in the correct order (degrees, minutes, seconds). Each component must match the production rule for "mantissa" above, with the following additional constraints:
 
* Only the first component can bear a minus sign. Subsequent components "inherit" the negativity (or lack thereof) from their predecessors.
* All components except the last must match the production rule for "integer".
 
The numeric components {{must}} be indicated by appending a suffix to each component. Valid suffixes are:
 
* degree: "deg", U+00B0 degree symbol (&deg;)
* minute: "min", straight single quote ('), U+2032 prime (&prime;)
* second: "sec", straight double quote ("), U+2033 double prime (&Prime;)
 
=== Examples ===
 
* 1729 (the smallest number that can be expressed as the sum of two cubes in two different ways)
* 1.61803399 (the golden ratio)
* 2,99792458e8 (the speed of light in a vacuum, measured in metres per second)
* -40 (value at which Celcius and Farenheit scales are equal)
* 1,000,000,000 (''Invalid:'' commas may be used as decimal points, but not for grouping thousands.)
* 57.2958 deg (1 radian, in degrees)
* -57&deg; 17&prime; 45.1&Prime; (-1 radian, in degrees, minutes and seconds)
* 4&deg; 30&Prime; (''Invalid'': no minutes)
* 4&deg; -30&prime; (''Invalid'': only first component may be negative)
 
=== Issues ===
 
{{ClosedIssue}} Will the name of this class (<tt>value</tt>) cause problems for parsers due to [[hcard-parsing|value excerpting]]?
* Changed <tt>value</tt> to <tt>num</tt>
 
{{OpenIssue}} What about 5&prime; 10&Prime; used to mean 5 foot, 10 inches?
* Possible solution:
<pre><abbr title="70 inch">5&prime; 10&Prime;</abbr></pre>
 
== <tt>unit:</tt> The Unit of Measurement ==
 
In the standard schema, the "unit" class is defined as an arbitrary string.
 
=== SI Units ===
 
Any unit may be used, but authors {{should}} attempt to use official SI units of measurement where appropriate.
 
Parsers that treat the unit as anything other than an opaque string {{should}} recognise the following case-sensitive list of units, derived from the SI list of base units and common derived units, with the addition of bits and bytes, which are commonly used on web pages. (Note that gram appears in this table instead of kilogram. This is deliberate.)
 
{| border="1" style="float:left"
|-
! Unit
! Symbols
! Aliases
|-
| metre
| m
| meter
|-
| gram
| g
| gramme
|-
| second
| s, sec
|-
| ampere
| A
| amp
|-
| candela
| cd
|-
| mole
| mol
|-
| kelvin
| K, &#x212a; (U+212A)
|-
| newton
| N
|-
| pascal
| Pa
|-
| joule
| J
|-
| watt
| W
|-
| coulomb
| C
|-
| volt
| V
|-
| ohm
| &Omega; (U+03A9), Ω (U+2126)
|-
| siemens
| S
|-
| farad
| F
|-
| weber
| Wb
|-
| henry
| H
|-
| tesla
| T
|-
| hertz
| Hz
|-
| byte
| B
|-
| bit
| b
|-
| litre
| L, l, &#x2113; (U+2113)
| liter
|-
| Celsius
| &#x2103; (U+2103), &#xB0;C (U+00B0 followed by captial C)
|-
| radian
| rad
|-
| lumen
| lx
|-
| becquerel
| Bq
|-
| gray
| Gy
|-
| sievert
| Sv
|-
| katal
| kat
|-
| steradian
| sr
|}
 
{| border="1" style="float:left;margin-left:1.5em;"
! 10<sup>n</sup>
! Prefix
! Symbol
|-
| 10<sup>24</sup>
| yotta-
|  Y
|-
| 10<sup>21</sup>
| zetta-
|  Z
|-
| 10<sup>18</sup>
| exa-
|  E
|-
| 10<sup>15</sup>
| peta-
|  P
|-
| 10<sup>12</sup>
| tera-
|  T
|-
| 10<sup>9</sup>
| giga-
|  G
|-
| 10<sup>6</sup>
| mega-
|  M
|-
| 10<sup>3</sup>
| kilo-
|  k<!-- (K)-->
|-
| 10<sup>2</sup>
| hecto-
|  h<!-- (H)-->
|-
| 10<sup>1</sup>
| deca-
|  da<!-- (D)-->
|-
| 10<sup>0</sup>
| (none)
| (none)
|-
| 10<sup>−1</sup>
| deci-
|  d
|-
| 10<sup>−2</sup>
| centi-
|  c
|-
| 10<sup>−3</sup>
| milli-
|  m
|-
| 10<sup>−6</sup>
| micro-
|  µ (U+00B5), μ (U+03BC), u
|-
| 10<sup>−9</sup>
| nano-
|  n
|-
| 10<sup>−12</sup>
| pico-
|  p
|-
| 10<sup>−15</sup>
| femto-
|  f
|-
| 10<sup>−18</sup>
| atto-
|  a
|-
| 10<sup>−21</sup>
| zepto-
|  z
|-
| 10<sup>−24</sup>
| yocto-
|  y
|}
 
<br style="clear:both">
 
The full names and for SI prefixes {{should}} only be combined with the full names for the units (or their aliases). Likewise the symbols for SI prefixes {{should}} only be combined with the symbols for the units.
 
* kilometre
* milligramme
* μL
* mΩ
* microV (''not recommended'')
* kgram (''not recommended'')
 
==== Combining units ====
 
Units may be multiplied by separating with whitespace, or divided using a slash (/) or U+2215 division slash (&#x2215;). Units may be raised to an integer power using a caret character. The unicode superscript numerals 2 to 9 (U+00B2, U+00B3, U+2074-79) {{must}} be supported as aliases for raising to the appropriate integer powers. Multiplication is more associative than division.
 
Examples:
 
* &lt;span class="unit">kg m / s&lt;/span>
* &lt;span class="unit">m/s^2&lt;/span>
* &lt;span class="unit">meter&#xB3;&lt;/span>
* &lt;abbr class="unit" title="&mu;m">micron&lt;/abbr>
 
=== Angular units ===
 
Units {{must not}} be given for measurements expressed in the degree schema: the degree itself is the unit. If the standard schema is used, units may be given in radians (rad).
 
=== Other / Non-SI Units ===
 
Authors {{may}} specify units other than those defined above, but {{should not}} assume that parsers will be able to interpret them. Authors using other units {{may}} provide a [[existing-rel-values|rel=glossary]] link to a page or fragment that defines the units.
 
==== Explicitly Defining a Unit ====
 
hmeasure may be used with the &lt;dfn> element to explicitly define a unit in terms of pre-defined units. The "title" attribute (if any) is taken to be an alias of the unit name.
 
<pre><p class="hmeasure" id="dfn-inch">
  An <dfn class="item" title="in">inch</dfn> is defined as
  <span class="num">0.0254</span> <span class="unit">m</span>.
</p></pre>
 
Other instances of hmeasure may then refer to this definition, implicitly:
 
<pre><p class="hmeasure">
  The <span class="item">action figure</span> has a <span class="type">height</span> of
  <span class="num">5</span> <span class="unit">in</span>.
</p></pre>
 
or explicitly:
 
<pre><p class="hmeasure">
  The <span class="item">action figure</span> has a <span class="type">height</span> of
  <span class="num">5</span>
  <a class="unit" rel="glossary" href="#dfn-inch">in</a>.
</p></pre>
 
{{OpenIssue}} Farenheit is reasonably common in some parts of the world. As &deg;C and &deg;F do not share their zero points, it is impossible to use this pattern to define &deg;F. &deg;F thus remains an opaque string with no meaning assigned to it my this spec. Should we add it to the list of pre-defined units?
 
=== Currency Units ===
 
If the money schema is being used, the unit is not an arbitrary string. It {{must}} be a three-letter ISO 4217 code. The following aliases for the four largest reserve currencies (as of 2008) are allowed:
 
{| border="1"
|-
! Unit
! Aliases
|-
| EUR
| €
|-
| GBP
| £
|-
| JPY
| ¥
|-
| USD
| $
|}
 
Other currencies {{may}} be displayed using these symbols only through the [[abbr-design-pattern|ABBR design pattern]]:
 
<pre><span class="hmoney">
  <abbr class="unit" title="AUD">$</abbr><span class="num">5.00</span>
</span></pre>
 
== <tt>item</tt>: The Thing Being Measured ==
 
An hCard, hCalendar event or textual description of the item being measured may be supplied.
 
<pre><p class="hmeasure">
  <span class="item vcard">The <span class="fn">Great Wall</span>of
  <span class="adr"><span class="country-name">China</span></span></span>
  is about <span class="num">6 700</span> <abbr title="km">kilometres</abbr>
  <abbr title="length" class="type">long</abbr>.
</p></pre>
 
If the item is not an hCard, hCalendar component or other recognised embedded microformat, then its contents are taken to be a string.
 
The item is optional.
 
=== The Item URI ===
 
If the item is an <code>&lt;a></code> element, then parsers should parse the URI ''and'' the node contents. The item URI is considered a significant way of determining what entity the hmeasure is describing. For example:
 
* If the item URI matches the UID for a known contact (e.g. an hCard somewhere on the page, or another page being parsed) then the hmeasure is taken to describe this contact (i.e. person, organisation, etc).
* A similar meaning can be implied when the item URI matches the UID for a known hCalendar event.
 
For example:
 
<pre><nowiki>
<div class="vcard">
  <a href="fn url uid" href="http://alice.example.net">Alice Jones</a>,
  <span class="adr">
    <span class="locality">Sydney</span>,
    <span class="country-name">Australia</span>.
  </span>
</div>
... further down the page ...
<span class="hmeasure">
  <a class="item" href="http://alice.example.net">Alice's</a>
  <span class="type">height</span> is
  <span class="num">180</span> <span class="unit">cm</span>
</span>
</nowiki></pre>
 
== <tt>type</tt>: The Dimension ==
 
The type specifies the dimension being measured. A measurement in, say, metres may be ambiguous because it could refer to a depth, a height, a length or a width. The optional type parameter allows you to specify a human-readable dimension.
 
== <tt>tolerance</tt>: The Error Tolerance ==
 
An optional tolerance may be specified as a percentage or as a nested hmeasure/hmoney.
 
Examples:
 
<pre><span class="hmeasure">
  <span class="type">Height</span>:
  <span class="num">5</span> <span class="unit">m</span>
  ± <span class="tolerance">2%</span>
</span></pre>
 
<pre><span class="hmoney">
  <span class="unit">$</span><span class="num">5.00</span>
  ± <span class="tolerance hmoney"><span class="unit">$</span><span class="num">1.00</span></span>
</span></pre>
 
When no tolerance is provided, a default tolerance of 0% {{must not}} be assumed — the tolerance is simply unknown.
 
== Minimisation Techniques ==
 
=== hmeasure ===
 
If no <tt>num</tt> is given, then the first number conforming to the EBNF above is taken to be the numerical value of the measurement. If no unit is given, then the entire string within the "hmeasure" (less the numerical value, item, type and tolerance) is taken to be the unit.
 
For example:
<pre><span class="hmeasure">3 pints <span class="item">beer</span></span></pre>
* '''Num:''' 3
* '''Unit:''' "pints"
* '''Item:''' "beer"
 
<pre><span class="hmeasure">4 m</span></pre>
* '''Num:''' 4
* '''Unit:''' metre
 
{{OpenIssue}} What about cases where there is no white space? SI says white space should always separate the quantity and unit, but in practice, many people do not include white space in measures.
 
{{ClosedIssue}} When no unit is explicitly given, how do we know which of the following two behaviours to take? Assume unit minimisation and follow the procedures here; or Assume angular schema and treat number as a degree/minute/second.
* Changed root element class for angular schema to <tt>hangle</tt>
 
=== hmoney ===
 
If no <tt>num</tt> is given, then the first number conforming to the EBNF above is taken to be the numerical value. If no <tt>unit</tt> is given, the first three-letter word (or single character alias) is taken to be the unit. White space between the implied unit and implied number is considered optional. The following are to be equivalent:
 
<pre>
<span class="hmoney"><span class="unit">EUR</span> <span class="num">1,00</span></span>
<span class="hmoney">EUR <span class="num">1,00</span></span>
<span class="hmoney">EUR1,00</span>
<span class="hmoney">1,00 EUR</span>
<span class="hmoney">1.00 <abbr class="unit" title="EUR">euro</abbr></span>
<span class="hmoney">€1,00</span>
<abbr class="hmoney" title="EUR 1,00">a euro</abbr>
</pre>
 
=== Minimising Tolerence ===
 
If the tolerance is not a percentage (i.e. it is a nested hmeasure/hmoney) and it does not contain a unit (either explicit, or by minimisation rules), then the unit is taken to be the unit of the parent hmeasure/hmoney.
 
If no explicit tolerance is given, the hmeasure string should be examined for an occurrence of the substring "±". If this is present, the substring after it, and continuing to the end of the hmeasure string is taken to be a tolerance. If the tolerance contains a "%" character, the tolerance is taken to be a percentage. Otherwise is it taken to be an implicit nested hmeasure/hmoney.
 
=== Implied Item ===
 
If no <tt>item</tt> is present, then the item {{may}} be inferred from nesting. If the <tt>hmeasure</tt> (or <tt>hangle</tt>, <tt>hmoney</tt>) is nested within an hCard or hCalendar event, then the implied item is the person, organisation or place represented by the hCard, or the event represented in hCalendar.
 
Future versions of this specification may add other implied item minimisation techniques.
 
=== Worked example ===
 
The following example shows a series of expansions taken by a parser encountering a minimised hmoney:
 
<pre><span class="hmoney">$1.54 ± 0.01</span></pre>
 
The "±" sign introduces a tolerance, which does not include a "%" symbol, so is treated as a nested hmoney.
 
<pre><span class="hmoney">$1.54 ±<span class="hmoney tolerance">0.01</span></span></pre>
 
No explicit units or values are given in either hmoney, so units and numerical values are extracted as per hmoney minimisation:
 
<pre><span class="hmoney"><span class="unit">$</span><span class="num">1.54</span>
±<span class="hmoney tolerance"><span class="num">0.01</span></span></span></pre>
 
The nested hmoney contains no unit, so it inherits its unit from the parent hmoney:
 
<pre><span class="hmoney"><span class="unit">$</span><span class="num">1.54</span>
±<span class="hmoney tolerance"><span class="unit">$</span> <span class="num">0.01</span></span></span></pre>
 
Parsed values:
 
* '''Unit:''' USD
* '''Num:''' 1.54
* '''Tolerance:'''
** '''Unit:''' USD
** '''Num:''' 0.01
 
== Examples ==
 
An example weather forecast using hmeasure, [[adr]], [[geo]] and [[hCalendar]] with the [[include-pattern|include pattern]]:
 
<pre><div>
    Weather for
    <span id="loc-lewes">
        <span class="adr location">
            <span class="locality">Lewes</span>,
            <span class="region">East Sussex</span>
        </span>
        (<span class="geo">50.8730;0.005</span>)
    </span>,
    <span class="vevent item" id="day-20080325">
        <a class="include" href="#loc-lewes"></a>
        <span class="summary">Tuesday</span>
        <abbr class="dtstart" title="2008-03-25">25 March</abbr>
        <abbr class="dtend" title="2008-03-26"></abbr>
    </span>:
    <span class="hmeasure">
        <a class="include" href="#day-20080325"></a>
        <abbr title="Maximum temperature" class="type">High</abbr>
        8 &#x2103;
    </span>,
    <span class="hmeasure">
        <a class="include" href="#day-20080325"></a>
        <abbr title="Minimum temperature" class="type">Low</abbr>
        0 &#x2103;
    </span>
</div></pre>
 
(The above example does not necessarily represent best practice. Authors should make themselves aware of the accessibility issues currently being discussed around the include and abbr design patterns.)
 
== Parsing Hints ==
 
This section is ''informative''. Parsers should note that (with the exception of certain non-ascii characters, which can be converted manually first) all the pre-defined non-currency units can be understood by the [http://www.gnu.org/software/units/ GNU units] program. A parser could act as a wrapper to a GNU units installation, or make use of a GNU units-based web service to convert between units.
 
=== num parsing ===
 
This Perl code shows how a number can be parsed according to the EBNF production in this spec. Its author (Toby Inkster) releases the following code into the public domain:
 
<pre><nowiki>#!/usr/bin/perl
 
my $nonZeroDigit = '[1-9]';
my $digit        = '\d';
my $natural      = "($nonZeroDigit)($digit)*";
my $integer      = "(0|\-?($natural)+)";
my $decimal      = "($integer)[\.\,]($digit)*";
my $mantissa    = "($decimal|$integer)";
my $sciNumber    = "($mantissa)[Ee]($integer)";
my $number      = "($sciNumber|$decimal|$integer)";
 
print "/$number/\n";
while (<>)
{
s/\s*//g;
m/$number/;
print "Number found: $1\n";
}</nowiki></pre>


== Related microformats ==
== Related microformats ==
* [[hcalendar]] can provide a complete quantitative description of a natural event (for example an earthquake) occurring at a specified time (dtstart/dtend) and location (embedded [[geo]]), by just embedding measured physical quantities in the 'descrition' span.
* [[hcalendar]] can provide a complete quantitative description of a natural event (for example an earthquake) occurring at a specified time (dtstart/dtend) and location (embedded [[geo]]), by just embedding measured physical quantities in the 'descrition' span.
* [[job-listing]] can use time measure for specify per what period of time the salary is for.
* [[job-listing]] can use time measure for specify per what period of time the salary is for.
* [[hlisting]] product dimensions; weight/mass; time period (as above).  
* [[hlisting]] product dimensions; weight/mass; time period (as above); price.
* [[directions-examples]] can use length measure for mileage and time to go from one point to the next.
* [[directions-examples]] can use length measure for mileage and time to go from one point to the next.
* [[recipe-examples]] can use weight, volume and time measure for ingredients and preparation time.
* [[recipe-examples]] can use weight, volume and time measure for ingredients and preparation time.
* [[currency]] can be viewed as a measurement unit, or as a component of a measurement unit, as in $ per hour.
* [[currency]] can be viewed as a measurement unit, or as a component of a measurement unit, as in $ per hour.
== Contributors ==
* Guillaume Lebleu
* [[User:AndyMabbett|Andy Mabbett]]
* Luca Postpischl
* [[User:ManuSporny|Manu Sporny]]
* [[User:TobyInk|TobyInk]]


==References==
==References==

Revision as of 19:26, 17 January 2016

<entry-title>Measure microformat research</entry-title>

This page is for researching and developing a measure microformat. Per the process:

The problem

Measures (e.g. weights, sizes, temperatures) occur frequently on the Web, they are constituted of a value a unit-measure and, in scientific and technical contexts, an experimental uncertainty. These 3 elements should be marked-up consistently across websites so that they can be easily identified and acted upon (export, compute, convert) in collaborative distributed applications.

Unit-measures differ from locale to locale (e.g. Fahrenheit vs. Celsius, pound versus Kilogram), making comparison and matching of offerings difficult.

The Measurement microformat will enable unambiguous description of physical quantities and thus provide a solid ground for data sharing and automation in many areas.

Next Steps

Related microformats

  • hcalendar can provide a complete quantitative description of a natural event (for example an earthquake) occurring at a specified time (dtstart/dtend) and location (embedded geo), by just embedding measured physical quantities in the 'descrition' span.
  • job-listing can use time measure for specify per what period of time the salary is for.
  • hlisting product dimensions; weight/mass; time period (as above); price.
  • directions-examples can use length measure for mileage and time to go from one point to the next.
  • recipe-examples can use weight, volume and time measure for ingredients and preparation time.
  • currency can be viewed as a measurement unit, or as a component of a measurement unit, as in $ per hour.

References

See also