measure-brainstorming: Difference between revisions
(+braindump) |
m (→Bogdan Stăncescu: oops) |
||
Line 36: | Line 36: | ||
;Language: "1 meter" vs. "1 metre" is a reasonable difference -- but non-SI units are usually translated. Even some SI units have different plurals, depending on the language, although in theory SI units are actually denoted by ''symbols'', not "words", as to make them non-translatable, and truly international (hence the name of the SI). I haven't really given much thought to a solution towards parsing these, because I find it overwhelming for the time. | ;Language: "1 meter" vs. "1 metre" is a reasonable difference -- but non-SI units are usually translated. Even some SI units have different plurals, depending on the language, although in theory SI units are actually denoted by ''symbols'', not "words", as to make them non-translatable, and truly international (hence the name of the SI). I haven't really given much thought to a solution towards parsing these, because I find it overwhelming for the time. | ||
;The sheer number of units: surprisingly, most people don't realize just how many units we humans have invented. Just take a look here: [http://www.asknumbers.com/ asknumbers.com] -- see how many categories there are? Now click on Flow Rate -- a non-ubiquitous type of measurement. Three sub-categories only for flow rates! Now click on Volume Flow Rate and take a look at the number of units in those lists. Remember, those are just in one of the three categories for flow rate! The UNECE standard mentioned in the [measure-formats#Measure_Formats measure formats] page is useful to define just that -- a ''standard'' set of units. But in practice there are a lot more being used out there. | ;The sheer number of units: surprisingly, most people don't realize just how many units we humans have invented. Just take a look here: [http://www.asknumbers.com/ asknumbers.com] -- see how many categories there are? Now click on Flow Rate -- a non-ubiquitous type of measurement. Three sub-categories only for flow rates! Now click on Volume Flow Rate and take a look at the number of units in those lists. Remember, those are just in one of the three categories for flow rate! The UNECE standard mentioned in the [[measure-formats#Measure_Formats|measure formats]] page is useful to define just that -- a ''standard'' set of units. But in practice there are a lot more being used out there. | ||
That's all I can think of as major hurdles right now. If I remember anything else, I'll post here. Please do give me feedback here if you want to ask more about any of the topics I touched above, or if you have other questions I might be able to reply to. --[[User:BogdanStancescu|BogdanStancescu]] 12:08, 9 Oct 2006 (PDT) | That's all I can think of as major hurdles right now. If I remember anything else, I'll post here. Please do give me feedback here if you want to ask more about any of the topics I touched above, or if you have other questions I might be able to reply to. --[[User:BogdanStancescu|BogdanStancescu]] 12:08, 9 Oct 2006 (PDT) |
Revision as of 19:08, 9 October 2006
Measure Microformat Brainstorming
This page collects ideas on how to use semantic XHTML to represent unambiguously measures.
Guillaume Lebleu
Basic example with elementary unit using the abbr pattern and the UNECE code (see measure-formats)
<span class="length">5 <abbr class="unit" title="FOT">Feet</abbr></span>
Optional "value" could be useful in some cases, for instance when the value is provided in plain text:
<span class="length"><abbr class="value" title="5">Five</abbr> <abbr class="unit" title="FOT">Feet</abbr></span>
Andy Mabbett
This Firefox extension may be of interest. Note, though, that it's been criticised for having a "nag" screen: Converter AndyMabbett 15:32, 3 Oct 2006 (PDT)
- This is the author of that extension. I don't want to go much into this, but I just want to clarify this briefly. The part with the nag screen is wrong on two counts: (1) that dialog isn't there anymore, and (2) even if it was there, you only needed to read a paragraph and click a button to make it go away forever -- but you don't have to take my word for it, install it for yourselves and see. Andy's report is accurate however -- the extension was criticized for that dialog (that's what you get from your free extension's users when you ask for 15 seconds of their time in return for hundreds of hours of your time). --BogdanStancescu 09:35, 9 Oct 2006 (PDT)
Bogdan Stăncescu
Here are my findings related to automatic parsing of measurements on web pages while developing the Converter extension. Please ask away if you want me to go into more detail on any of the topics -- I'm not sure which of my experiences are relevant to microformats, so I'm going to give you an overview of my conclusions.
By the way of an introduction, the Converter is a Firefox extension which tries to convert all measurements it finds in any web page to their Imperial or metric counterpart (e.g. Fahrenheit to Celsius, and Celsius to Fahrenheit; meters to feet and feet to meters). There are two steps to the conversion process: (1) identifying the measurements in the page, and (2) converting them. As expected, the conversion part is trivial, at least conceptually. The parsing is the tricky bit, and that's also where the Converter's challenges also become relevant for microformats.
Here are the main challenges I have encountered while writing the Converter:
- Presentation standardization
- The first, biggest and most obvious challenge is lack of almost any de facto standardization in respect to data presentation. What I mean is that although the units themselves are more or less standardized (more on that later), they are presented in various ways within web pages. Take these examples: "50 foot monster", "50 ft monster", "50 feet monster", "50-foot monster", "50-feet monster" -- and my personal favorite, "fifty-foot monster" (more on this later);
- Unit standardization
- I live in Europe, where I've always used the metric system. As such, this probably was a much bigger nasty surprise for me than it is for a user of the Imperial/U.S. Customary system: in the Imperial system, the units themselves vary depending on where you are -- miles, pints, and a whole lot of other units come in many different flavors, but they're all written the same in regular usage;
- Language
- "1 meter" vs. "1 metre" is a reasonable difference -- but non-SI units are usually translated. Even some SI units have different plurals, depending on the language, although in theory SI units are actually denoted by symbols, not "words", as to make them non-translatable, and truly international (hence the name of the SI). I haven't really given much thought to a solution towards parsing these, because I find it overwhelming for the time.
- The sheer number of units
- surprisingly, most people don't realize just how many units we humans have invented. Just take a look here: asknumbers.com -- see how many categories there are? Now click on Flow Rate -- a non-ubiquitous type of measurement. Three sub-categories only for flow rates! Now click on Volume Flow Rate and take a look at the number of units in those lists. Remember, those are just in one of the three categories for flow rate! The UNECE standard mentioned in the measure formats page is useful to define just that -- a standard set of units. But in practice there are a lot more being used out there.
That's all I can think of as major hurdles right now. If I remember anything else, I'll post here. Please do give me feedback here if you want to ask more about any of the topics I touched above, or if you have other questions I might be able to reply to. --BogdanStancescu 12:08, 9 Oct 2006 (PDT)