measure

Jump to: navigation, search

Measure microformat

Contents

The problem

Measures (e.g. weights, sizes, temperatures) occur frequently on the Web, they are constituted of a value a unit-measure and, in scientific and technical contexts, an experimental uncertainty. These 3 elements should be marked-up consistently across websites so that they can be easily identified and acted upon (export, compute, convert) in collaborative distributed applications.

Unit-measures differ from locale to locale (e.g. Fahrenheit vs. Celsius, pound versus Kilogram), making comparison and matching of offerings difficult.

The Measurement microformat will enable unambiguous description of physical quantities and thus provide a solid ground for data sharing and automation in many areas.

Draft Schema

Rationale: The names "type" and "item" are taken from hReview.

open issue! Is tolerance needed? It is useful for some circumstances, but perhaps not common enough to be included in the spec. open issue! A dtmeasured property may be useful, especially for hmoney, as prices fluctuate.

Standard Measure Schema

Angular Measure Schema

Money Schema

num: The Value

Arbitrary white space MAY be included in the value to improve readability (but only when the num class is explicitly used — not when mimimisation is employed). Parsers MUST strip out all white space before further processing.

In the standard and money schemas, the value MUST be a number, formatted according to the following EBNF pattern:

non-zero-digit = "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
digit          = "0" | non-zero-digit ;
natural        = non-zero-digit , {digit} ;
integer        = "0" | [ "-" ] , natural ;
dot-decimal    = integer , "." , {digit} ;
comma-decimal  = integer , "," , {digit} ;
e-sign         = "e" | "E" ;
mantissa       = dot-decimal | comma-decimal | integer ;
sci-number     = mantissa , e-sign , integer ;
number         = dot-decimal | comma-decimal | integer | sci-number ;

This roughly corresponds to a subset of C syntax for floating points and integers, excluding octal and hexadecimal representations. However, note that both commas and stops may be used as decimal points.

The Unicode minus sign (U+2212) and ASCII-compatible hyphen-minus (U+002D) MUST both be treated as acceptable indicators of a negative number. In addition, the symbols ¼ (U+00BC), ½ (U+00BD) and ¾ (U+00BE) SHOULD be supported as aliases for 0.25, 0.5 and 0.75 respectively.

In the angular measure schema, a measure is expressed as a combination of up to three numeric components: called degrees, minutes and seconds. Any combination of these components may be used, except when degrees and seconds are given minutes MUST be present. The components MUST appear in the correct order (degrees, minutes, seconds). Each component must match the production rule for "mantissa" above, with the following additional constraints:

The numeric components MUST be indicated by appending a suffix to each component. Valid suffixes are:

Examples

Issues

closed issue Will the name of this class (value) cause problems for parsers due to value excerpting?

open issue! What about 5′ 10″ used to mean 5 foot, 10 inches?

<abbr title="70 inch">5′ 10″</abbr>

unit: The Unit of Measurement

In the standard schema, the "unit" class is defined as an arbitrary string.

SI Units

Any unit may be used, but authors SHOULD attempt to use official SI units of measurement where appropriate.

Parsers that treat the unit as anything other than an opaque string SHOULD recognise the following case-sensitive list of units, derived from the SI list of base units and common derived units, with the addition of bits and bytes, which are commonly used on web pages. (Note that gram appears in this table instead of kilogram. This is deliberate.)

Unit Symbols Aliases
metre m meter
gram g gramme
second s, sec
ampere A amp
candela cd
mole mol
kelvin K, K (U+212A)
newton N
pascal Pa
joule J
watt W
coulomb C
volt V
ohm Ω (U+03A9), Ω (U+2126)
siemens S
farad F
weber Wb
henry H
tesla T
hertz Hz
byte B
bit b
litre L, l, ℓ (U+2113) liter
Celsius ℃ (U+2103), °C (U+00B0 followed by captial C)
radian rad
lumen lx
becquerel Bq
gray Gy
sievert Sv
katal kat
steradian sr
10n Prefix Symbol
1024 yotta- Y
1021 zetta- Z
1018 exa- E
1015 peta- P
1012 tera- T
109 giga- G
106 mega- M
103 kilo- k
102 hecto- h
101 deca- da
100 (none) (none)
10−1 deci- d
10−2 centi- c
10−3 milli- m
10−6 micro- µ (U+00B5), μ (U+03BC), u
10−9 nano- n
10−12 pico- p
10−15 femto- f
10−18 atto- a
10−21 zepto- z
10−24 yocto- y


The full names and for SI prefixes SHOULD only be combined with the full names for the units (or their aliases). Likewise the symbols for SI prefixes SHOULD only be combined with the symbols for the units.

Combining units

Units may be multiplied by separating with whitespace, or divided using a slash (/) or U+2215 division slash (∕). Units may be raised to an integer power using a caret character. The unicode superscript numerals 2 to 9 (U+00B2, U+00B3, U+2074-79) MUST be supported as aliases for raising to the appropriate integer powers. Multiplication is more associative than division.

Examples:

Angular units

Units MUST NOT be given for measurements expressed in the degree schema: the degree itself is the unit. If the standard schema is used, units may be given in radians (rad).

Other / Non-SI Units

Authors MAY specify units other than those defined above, but SHOULD NOT assume that parsers will be able to interpret them. Authors using other units MAY provide a rel=glossary link to a page or fragment that defines the units.

Explicitly Defining a Unit

hmeasure may be used with the <dfn> element to explicitly define a unit in terms of pre-defined units. The "title" attribute (if any) is taken to be an alias of the unit name.

<p class="hmeasure" id="dfn-inch">
  An <dfn class="item" title="in">inch</dfn> is defined as
  <span class="num">0.0254</span> <span class="unit">m</span>.
</p>

Other instances of hmeasure may then refer to this definition, implicitly:

<p class="hmeasure">
  The <span class="item">action figure</span> has a <span class="type">height</span> of
  <span class="num">5</span> <span class="unit">in</span>.
</p>

or explicitly:

<p class="hmeasure">
  The <span class="item">action figure</span> has a <span class="type">height</span> of
  <span class="num">5</span>
  <a class="unit" rel="glossary" href="#dfn-inch">in</a>.
</p>

open issue! Farenheit is reasonably common in some parts of the world. As °C and °F do not share their zero points, it is impossible to use this pattern to define °F. °F thus remains an opaque string with no meaning assigned to it my this spec. Should we add it to the list of pre-defined units?

Currency Units

If the money schema is being used, the unit is not an arbitrary string. It MUST be a three-letter ISO 4217 code. The following aliases for the four largest reserve currencies (as of 2008) are allowed:

Unit Aliases
EUR
GBP £
JPY ¥
USD $

Other currencies MAY be displayed using these symbols only through the ABBR design pattern:

<span class="hmoney">
  <abbr class="unit" title="AUD">$</abbr><span class="num">5.00</span>
</span>

item: The Thing Being Measured

An hCard, hCalendar event or textual description of the item being measured may be supplied.

<p class="hmeasure">
  <span class="item vcard">The <span class="fn">Great Wall</span>of
  <span class="adr"><span class="country-name">China</span></span></span>
  is about <span class="num">6 700</span> <abbr title="km">kilometres</abbr>
  <abbr title="length" class="type">long</abbr>.
</p>

If the item is not an hCard, hCalendar component or other recognised embedded microformat, then its contents are taken to be a string.

The item is optional.

The Item URI

If the item is not an embedded hCard or hCalendar event, and is an <a> element or other linking element, then parsers should parse the URI and the node contents. The item URI is considered a significant way of determining what entity the hmeasure is describing. For example:

For example:

<div class="vcard">
  <a href="fn url uid" href="http://alice.example.net">Alice Jones</a>,
  <span class="adr">
    <span class="locality">Sydney</span>,
    <span class="country-name">Australia</span>.
  </span>
</div>
... further down the page ...
<span class="hmeasure">
  <a class="item" href="http://alice.example.net">Alice's</a>
  <span class="type">height</span> is
  <span class="num">180</span> <span class="unit">cm</span>
</span>

type: The Dimension

The type specifies the dimension being measured. A measurement in, say, metres may be ambiguous because it could refer to a depth, a height, a length or a width. The optional type parameter allows you to specify a human-readable dimension.

tolerance: The Error Tolerance

An optional tolerance may be specified as a percentage or as a nested hmeasure/hmoney.

Examples:

<span class="hmeasure">
  <span class="type">Height</span>:
  <span class="num">5</span> <span class="unit">m</span>
  ± <span class="tolerance">2%</span>
</span>
<span class="hmoney">
  <span class="unit">$</span><span class="num">5.00</span>
  ± <span class="tolerance hmoney"><span class="unit">$</span><span class="num">1.00</span></span>
</span>

When no tolerance is provided, a default tolerance of 0% MUST NOT be assumed — the tolerance is simply unknown.

Minimisation Techniques

hmeasure

If no num is given, then the first number conforming to the EBNF above is taken to be the numerical value of the measurement. If no unit is given, then the entire string within the "hmeasure" (less the numerical value, item, type and tolerance) is taken to be the unit.

For example:

<span class="hmeasure">3 pints <span class="item">beer</span></span>
<span class="hmeasure">4 m</span>

open issue! What about cases where there is no white space? SI says white space should always separate the quantity and unit, but in practice, many people do not include white space in measures.

closed issue When no unit is explicitly given, how do we know which of the following two behaviors to take? Assume unit minimisation and follow the procedures here; or Assume angular schema and treat number as a degree/minute/second.

hmoney

If no num is given, then the first number conforming to the EBNF above is taken to be the numerical value. If no unit is given, the first three-letter word (or single character alias) is taken to be the unit. White space between the implied unit and implied number is considered optional. The following are to be equivalent:

<span class="hmoney"><span class="unit">EUR</span> <span class="num">1,00</span></span>
<span class="hmoney">EUR <span class="num">1,00</span></span>
<span class="hmoney">EUR1,00</span>
<span class="hmoney">1,00 EUR</span>
<span class="hmoney">1.00 <abbr class="unit" title="EUR">euro</abbr></span>
<span class="hmoney">€1,00</span>
<abbr class="hmoney" title="EUR 1,00">a euro</abbr>

Minimising Tolerence

If the tolerance is not a percentage (i.e. it is a nested hmeasure/hmoney) and it does not contain a unit (either explicit, or by minimisation rules), then the unit is taken to be the unit of the parent hmeasure/hmoney.

If no explicit tolerance is given, the hmeasure string should be examined for an occurrence of the substring "±". If this is present, the substring after it, and continuing to the end of the hmeasure string is taken to be a tolerance. If the tolerance contains a "%" character, the tolerance is taken to be a percentage. Otherwise is it taken to be an implicit nested hmeasure/hmoney.

Implied Item

If no item is present, then the item MAY be inferred from nesting. If the hmeasure (or hangle, hmoney) is nested within an hCard or hCalendar event, then the implied item is the person, organisation or place represented by the hCard, or the event represented in hCalendar.

Future versions of this specification may add other implied item minimisation techniques.

Worked example

The following example shows a series of expansions taken by a parser encountering a minimised hmoney:

<span class="hmoney">$1.54 ± 0.01</span>

The "±" sign introduces a tolerance, which does not include a "%" symbol, so is treated as a nested hmoney.

<span class="hmoney">$1.54 ±<span class="hmoney tolerance">0.01</span></span>

No explicit units or values are given in either hmoney, so units and numerical values are extracted as per hmoney minimisation:

<span class="hmoney"><span class="unit">$</span><span class="num">1.54</span>
±<span class="hmoney tolerance"><span class="num">0.01</span></span></span>

The nested hmoney contains no unit, so it inherits its unit from the parent hmoney:

<span class="hmoney"><span class="unit">$</span><span class="num">1.54</span>
±<span class="hmoney tolerance"><span class="unit">$</span> <span class="num">0.01</span></span></span>

Parsed values:

Examples

An example weather forecast using hmeasure, adr, geo and hCalendar with the include pattern:

<div>
    Weather for
    <span id="loc-lewes">
        <span class="adr location">
            <span class="locality">Lewes</span>,
            <span class="region">East Sussex</span>
        </span>
        (<span class="geo">50.8730;0.005</span>)
    </span>,
    <span class="vevent item" id="day-20080325">
        <a class="include" href="#loc-lewes"></a>
        <span class="summary">Tuesday</span>
        <abbr class="dtstart" title="2008-03-25">25 March</abbr>
        <abbr class="dtend" title="2008-03-26"></abbr>
    </span>:
    <span class="hmeasure">
        <a class="include" href="#day-20080325"></a>
        <abbr title="Maximum temperature" class="type">High</abbr>
        8 ℃
    </span>,
    <span class="hmeasure">
        <a class="include" href="#day-20080325"></a>
        <abbr title="Minimum temperature" class="type">Low</abbr>
        0 ℃
    </span>
</div>

(The above example does not necessarily represent best practice. Authors should make themselves aware of the accessibility issues currently being discussed around the include and abbr design patterns.)

Parsing Hints

This section is informative.

num parsing

This Perl code shows how a number can be parsed according to the EBNF production in this spec. Its author (Toby Inkster) releases the following code into the public domain:

#!/usr/bin/perl

my $nonZeroDigit = '[1-9]';
my $digit        = '\d';
my $natural      = "($nonZeroDigit)($digit)*";
my $integer      = "(0|\-?($natural)+)";
my $decimal      = "($integer)[\.\,]($digit)*";
my $mantissa     = "($decimal|$integer)";
my $sciNumber    = "($mantissa)[Ee]($integer)";
my $number       = "($sciNumber|$decimal|$integer)";

print "/$number/\n";
while (<>)
{
	s/\s*//g;
	m/$number/;
	print "Number found: $1\n";
}

unit parsing

Parsers should note that (with the exception of certain non-ascii characters, which can be converted manually first) all the pre-defined non-currency units can be understood by the GNU units program. A parser could act as a wrapper to a GNU units installation, or make use of a GNU units-based web service to convert between units.

Related microformats

Contributors

References

See also

measure was last modified: Saturday, December 20th, 2008

Views