Difference between revisions of "measure-brainstorming"
(→Andy Mabbett: HTML entities)
m (→HTML Entities: Second sup2->sup3, and fix code snippets)
|Line 26:||Line 26:|
*For squared and cubic values, the HTML entities <code>& sup2;</code> and <code>&
*For squared and cubic values, the HTML entities <code>²</code> and <code>&;</code> should be borne in mind.
*For temperatures and angels, the HTML entity <code>& deg;</code> exists.
*For temperatures and angels, the HTML entity <code>& deg;</code> exists.
*The following currency entities exist:
*The following currency entities exist:
**<code>¤</code> - <code>& curren;</code> - currency
**<code>¤</code> - <code>¤</code> - currency
**<code>¢</code> - <code>& cent;</code> - cent
**<code>¢</code> - <code>¢</code> - cent
**<code>£</code> - <code>& pound;</code> - pound
**<code>£</code> - <code>£</code> - pound
**<code>¥</code> - <code>& yen;</code> - yen
**<code>¥</code> - <code>¥</code> - yen
**<code>€</code> - <code>& euro;</code> - Euro
**<code>€</code> - <code>€</code> - Euro
Revision as of 13:05, 7 April 2007
- 1 Measure Microformat Brainstorming
Measure Microformat Brainstorming
This page collects ideas on how to use semantic XHTML to represent unambiguously measures.
Basic example with elementary unit using the abbr pattern and the UNECE code (see measure-formats)
<span class="length">5 <abbr class="unit" title="FOT">Feet</abbr></span>
Optional "value" could be useful in some cases, for instance when the value is provided in plain text:
<span class="length"><abbr class="value" title="5">Five</abbr> <abbr class="unit" title="FOT">Feet</abbr></span>
- This is the author of that extension. I don't want to go much into this, but I just want to clarify this briefly. The part with the nag screen is wrong on two counts: (1) that dialog isn't there anymore, and (2) even if it was there, you only needed to read a paragraph and click a button to make it go away forever -- but you don't have to take my word for it, install it for yourselves and see. Andy's report is accurate however -- the extension was criticized for that dialog (that's what you get from your free extension's users when you ask for 15 seconds of their time in return for hundreds of hours of your time). --BogdanStancescu 09:35, 9 Oct 2006 (PDT)
- For squared and cubic values, the HTML entities
³should be borne in mind.
- For temperatures and angels, the HTML entity
- The following currency entities exist:
Here are my findings related to automatic parsing of measurements on web pages while developing the Converter extension. Please ask away if you want me to go into more detail on any of the topics -- I'm not sure which of my experiences are relevant to microformats, so I'm going to give you an overview of my conclusions.
By the way of an introduction, the Converter is a Firefox extension which tries to convert all measurements it finds in any web page to their Imperial or metric counterpart (e.g. Fahrenheit to Celsius, and Celsius to Fahrenheit; meters to feet and feet to meters). There are two steps to the conversion process: (1) identifying the measurements in the page, and (2) converting them. As expected, the conversion part is trivial, at least conceptually. The parsing is the tricky bit, and that's also where the Converter's challenges also become relevant for microformats.
Here are the main challenges I have encountered while writing the Converter:
- Presentation standardization
- The first, biggest and most obvious challenge is lack of almost any de facto standardization in respect to data presentation. What I mean is that although the units themselves are more or less standardized (more on that later), they are presented in various ways within web pages. Take these examples: "50 foot monster", "50 ft monster", "50 feet monster", "50-foot monster", "50-feet monster" -- and my personal favorite, "fifty-foot monster" (more on this later);
- Note that using a microformat using in particular the abbr design pattern would make each of these examples less ambiguous if not unambiguous. See below --Guillaume_Lebleu:
<span class="height"><span class="value">50</span><abbr class="unit" title="FOT">foot</abbr></span> monster
<span class="height"><span class="value">50</span><abbr class="unit" title="FOT">ft</abbr></span> monster
<span class="height"><span class="value">50</span>-<abbr class="unit" title="FOT">foot</abbr></span> monster
<span class="height"><span class="value">50</span><abbr class="unit" title="FOT">feet</abbr></span> monster
<span class="height"><abbr class="value" title="50">fifty</abbr><abbr class="unit" title="FOT">foot</abbr></span> monster
- Of course; as far as I could gather, that's actually the purpose of microformats -- bridging the gap between what humans and machines can understand, no? --BogdanStancescu 00:30, 11 Oct 2006 (PDT)
- Unit standardization
- I live in Europe, where I've always used the metric system. As such, this probably was a much bigger nasty surprise for me than it is for a user of the Imperial/U.S. Customary system: in the Imperial system, the units themselves vary depending on where you are -- miles, pints, and a whole lot of other units come in many different flavors, but they're all written the same in regular usage;
- "1 meter" vs. "1 metre" is a reasonable difference -- but non-SI units are usually translated. Even some SI units have different plurals, depending on the language, although in theory SI units are actually denoted by symbols, not "words", as to make them non-translatable, and truly international (hence the name of the SI). I haven't really given much thought to a solution towards parsing these, because I find it overwhelming for the time.
- The sheer number of units
- surprisingly, most people don't realize just how many units we humans have invented. Just take a look here: asknumbers.com -- see how many categories there are? Now click on Flow Rate -- a non-ubiquitous type of measurement. Three sub-categories only for flow rates! Now click on Volume Flow Rate and take a look at the number of units in those lists. Remember, those are just in one of the three categories for flow rate! The UNECE standard mentioned in the measure formats page is useful to define just that -- a standard set of units. But in practice there are a lot more being used out there.
- Do you have examples from the Web (a URL) of non-UNECE units. One possibility would be to provide the ability for a unit to be defined as a division of products of other units. This is consistent with the measure-formats#Systeme_International, which defines 7 base units and all other units as derived units (of course some units, even though they are derived are much easily represented as simple ones). This is what XBRL has done for financial/accounting/reporting. See currency-formats#XBRL and theorical example (ampere acre per second) below --Guillaume_Lebleu:
- Unfortunately I don't have URLs -- almost at all -- with measurements, although I've been in the "business" for a while. The reason for this is that I collect URLs of pages I encounter which are not properly parsed by the Converter, and when I release a version which understands those, I delete the URLs. Also, I never intended to cover all units in the Converter myself, for a multitude of reasons -- therefore I was never interested in the more exotic ones.
Guillaume Lebleu's example
<span class="unit"> <abbr class="unit" title="AMP">Ampere</abbr> <abbr class="unit" title="ACR">acre</abbr> <span class="divide">per</span> <abbr class="unit" title="SEC">second</abbr> </span>
- Regarding your idea of breaking down the units in base units, that's something I've also been toying with in my head for the Converter. For my particular application, it's technically more difficult to implement this breakdown. For microformats, it would be easier, but there still remains at least one potential problem: you end up with a huge mess in the page. If a standard is too complicated to follow, one tends to give up altogether.
- Consider a document which actually discusses some sort of current variation per farm, and therefore needs to repeatedly refer to ampere acres per second. For human use, they'd simply define the AAS somewhere at the top of the document, and then refer to AAS, KAAS or MAAS as needed. Maybe a similar approach should be considered for microformats as well:
We define the <span class="unit_definition"> <abbr class="unit_name">AAS</span> as <abbr class="unit" title="AMP">Ampere</abbr> <abbr class="unit" title="ACR">acre</abbr> <span class="divide">per</span> <abbr class="unit" title="SEC">second</abbr> </span>.
- And then use the "AAS" throughout the document as any other pre-defined unit. How would you define (and use) the KAAS (1000 AAS) or MAAS (1,000,000 AAS) though? Is there any standard way already to use data multipliers in microformats? Or should we discuss that? Or is it out of scope? --BogdanStancescu 00:30, 11 Oct 2006 (PDT)
That's all I can think of as major hurdles right now. If I remember anything else, I'll post here. Please do give me feedback here if you want to ask more about any of the topics I touched above, or if you have other questions I might be able to reply to. --BogdanStancescu 12:08, 9 Oct 2006 (PDT)
Because it is easier to provide examples, I will first list examples.
Categorical vs Ordinal Data
Various measurements may produce NON-Numerical values:
- a pain scale: most severe, very severe, severe, ...
- or the TNM tumour classification system: T0, Tx, T1, T2, T3, T4, N0, ...
There is even a more fundamental issue related to numbers themselves, e.g.:
- Lists or Years are sometimes written using Roman Numbers
- however, the strings corresponding to Roman Numbers, when sorted alphabetically, do NOT retain the correct order
- i.e. C (100) preceds L (50), which preceds X (10)
- there are other numbering schemes
A Single Value / Data Point
This is the most simple data format and pretty straitforward to implement.
- the distance between 2 cities is 40 km
- the velocity is 62 mph
- most other simple entires (...)
An Interval Measurement
- time: the shop is open between 6am - 18pm on every day of the week, exept Saturdays from 9am - 16pm and Sundays from 9am - 13pm
This is more about an interval measurement. Every variable can have 2 (or more) values, e.g.:
- the levels of rain fall were between 25mm - 35mm
- the maximum velocity of various cars was 220 - 250 km/h
Should these values be stored as separate values? [e.g. low / high] Or should the microformats be able to store an interval?
See also the examples for statistical summaries below.
- Mark up each as a separate measurement, and wrap them in a "range" microforamt? Andy Mabbett 11:36, 22 Nov 2006 (PST)
- the GPS coordinates are 12°14' N and 25°55' E
- the dimension of the box is 3m x 2m x 0.55m
- this is three separate, single measurements, surely? Andy Mabbett 09:21, 22 Nov 2006 (PST)
- 3 x 2 x 0.55 cubic meter, still 3 measurements, BUT given as cubic meter => ONE measurement?
- Who writes 3x2x0.55 cubic meter? You'd write "3.3m3" Andy Mabbett 11:36, 22 Nov 2006 (PST)
- the surface was 2 x 3 square feet ???
- Who writes 2x3 sq ft? You'd write "2ftx3ft" or "6ft2" Andy Mabbett 11:36, 22 Nov 2006 (PST)
- IF we write "3.3m3" or "6ft2", we loose information
- IF I want a surface, I would prefer the sqare feet unit, and NOT ...feet x ...feet
- writing for every measure a markup, will bloat the code extensively
- data matrices would be very effective here
- how would you make such a matrix? There are different ways how such information can be "compounded". (length per time = speed, length * length = area). Maybe a we can group those measurements by a surrounding information, what the context is. --Emil 02:50, 25 Dec 2006 (PST)
- data matrices would be very effective here
Often, a group of data is summarized using a statistics:
- the mean length was 1.3m (SD 0.12m, group size 22)
- the median age was 42 years (interquartile range 95% 18 - 97)
Accuracy vs. Precision
- How detailed should a measurement be stored?
- Microformats aren't for storing measurements; they're for "labelling" the measurements that are already present. Andy Mabbett 09:23, 22 Nov 2006 (PST)
- If Accuracy and precision are relevant to the measurement, how do we store these?
Standardization of Measurement
- sometimes we may need to store the calibration information / calibration curves
- we may need to store the reference point the measurement is based on
- we may need to store the normal values
- biomedical measurements are often laboratory dependent, so it does NOT make sense to have the measurement without the corresponding normal values
- e.g. anti-Hepatitis B surface antigen antibody (anti-HBs) Titer: 32 MIU/ml
- normal: 0 (non-infected, non-past infection, non-immunity)
- protective immunity: >10 MIU/ml
- interpretation is however more complex, depending on other tests as well
From my understanding, this microformat should concentrate on the notation of a measurement. So there will be some aspects, which has to be covered (elsewhere?) to improve the automatic use it or this microformat only uses some base informationens (units / dimensions) and derives all used from those base / build-in once.
Dimension vs. Unit vs. Scale vs. Measurement
A measurement is the combination of a number (value) and a unit (kind).
- 3km (3 Kilo Metre = 3.000 Metre)
A unit is a view for a measure of a dimension. There are two kinds how units can be different to each user:
- Units Differ by Scale (Prefix)
- 3km is the same as 3.000 meter or 300.000 cm (Its the same unit, with a different prefix, which works like a factor for the value, to lower the amount of symbols / numbers. The scale should be an own element and we can make use of the standard prefixes, like they are defined on The Unified Code for Units of Measure or MathML).
- Different units of the same dimension can be transfered into each other.
- Metre is a unit of the dimension length.
- Foot is a unit of the dimension length.
A Dimension is a base-dimension (see SI-System) or a compound dimension.
- length is a base dimension
- time is a base dimension
- speed is a comound dimension (length per time). There for a measurement of speed has one number and two unit by a mathexpresseion, which form their own unit. e.g. 10 m/s (10 Metre per second).
If we express a measurement in a microformat by the unit, the dimension is indirect provided by it. But a microformat, which uses measurement as a part, needs to define the dimension of it, to keep the use of the unit as an user choice. So, we could have a generall measurement element, which allows all kiinds of units to use. As a derrived format, we can have sub-formats, which limit the list of units (or define an alternate list) by only allowing specific dimension(s).
- currency-proposal, with the money element which usese the same elements value (should then replace amount), scale (should be introduced), unit (should replace currency) which is limit to the ISO 4217 list.
- length, which only allows units which measures the dimension length, like FOT, MTR ...
Identification of Units
There are so many Units around - not only the existing one. There are deprecated ones like from Rome empire etc. For example "Foot" is not an unique identification of a unit. There is not only the British and U.S., there are for example same old German ones, before those areas joined the international metre convention in 1875:
- 25 cm in Hessen
- 28,935 cm in Bremen
- 29,641 cm in Oldenburg
- 29,1859 cm in Bayern
- 30,385 cm in Meiningen-Hildburghausen
- 31,385 cm in Preußen
- 31,608 cm in Wien/Österreich
- 32,61 cm in Bad Homburg vor der Höhe
- 33 1/3 cm in der Pfalz
So there is the need of a unique identification of those units. I found two approach right:
MathML defines the construction of an URI like:
But as you can see, there is right now no way to distinguish the different German foots based on the area inside Germany. Furthermore the context is so variable, that the same unit can be described by different URLs.
OpenMath defines the units inside of content directories:
So there is a unique URLs for a Unit, but not every Unit is covered.
Transformation of Units
A real benefit is the automatic transformation of a unit, so that the write can write the measurement in his context (e.g. in the U.S. foot, or a quote from an antike text in Rome Empire foot) and the reader can get a transformation in his context (e.g. the value in metre). There fore there is the need of additional transformation information. And there are some different kinds of transformation:
units of same dimension
e.g. foot to metre
units of compound but same dimension
e.g. metre/s and mach-number
compound measurement context
This switch works up to 5 Ampere by 220 Volt
The reader might to now, which Watt device he can attach (1100 Watt would be the answer).
The dimension of the box is 3m x 2m x 0.55m
There might be some question like:
- volume (3,3 m³)
- surface (17,5 m²)
A generall measurement should make use of the following informations:
value: a number, which represents the amount of the measurement. The number should follow one of the following representation:
- natural (positiv and negativ): e.g. -1, 0, 1
- decimal fraction (positiv and negativ): e.g. -2.5, 0.123
- natural fraction (positiv and negativ): e.g. -2/3, 3/7
scale: a factor used to lower the needed numbers of the value. The scale should be either
- a letter to refer a build-in factor, which is defined in The Unified Code for Units of Measure or MathML).
- a number like defined on value
unit: the unit used for the measurement. The unit should follow one of the following reprensentation:
- build-in short-form like defined on Standards for Trade and Electronic Business (or any other defined list which will be defined as the standard list for this format)
- a reference to a unit definition. (I think there is the need of a markup/language to define new units and/or the transformation between units).
<span class="measurement"><abbr class="value" title="5">Five</abbr> <abbr class="scale" title="k">kilo</abbr> <abbr class="unit" title="MTR">metre</abbr></span>
when we have a defined sub-measurement format for length, it could also be written:
<span class="length"><abbr class="value" title="5">Five</abbr> <abbr class="scale" title="k">kilo</abbr> <abbr class="unit" title="MTR">metre</abbr></span>
List of possible Sub-Formats
Here is a (first) list of possible keywords for sub-formats and their unit list or compound kind:
- money - unit limit to the ISO 4217 List
- length - unit limited to e.g. MTR (Metre), FOT (Foot) ....)
- Either a measurement with units like MTK (Square Metre), FTK (Square Foot)
- or a compound format with elements (width:length, height:length)
- Either a measurement with units like MTQ (Cubic Metre), FTQ (Cubic Foot), LTR (Litre) ...
- or a compound format with elements (width:length, height:length, depth:length)
- time or duration or period - unit limited to e.g. sec (second), min (minute) ...
- mass or weight - unit limited to GRM (Gram), ...
- power or electricity - unot limited to AMP (Ampere), OHM (Ohm), ...