rest/datatypes: Difference between revisions
Kevin Marks (talk | contribs) No edit summary |
mNo edit summary |
||
Line 1: | Line 1: | ||
<h1> Datatypes in HTML </h1> | |||
One of the challenges of using HTML as a data transport is that everything, by default, is a string. This page explores ways to use microformats -- specifically, class names -- to encode data type information, e.g., for use with [[xoxo]] and [[rest/ahah]], in order to allow lossless import/export from various languages. These could also be used with forms to provide [[rest/description]]s of the type of data expected. | One of the challenges of using HTML as a data transport is that everything, by default, is a string. This page explores ways to use microformats -- specifically, class names -- to encode data type information, e.g., for use with [[xoxo]] and [[rest/ahah]], in order to allow lossless import/export from various languages. These could also be used with forms to provide [[rest/description]]s of the type of data expected. | ||
__TOC__ | |||
== Contributors == | |||
* Dr. Ernie Prabhakar | |||
* Kevin Marks | |||
* Tantek Çelik | |||
== Examples == | == Examples == | ||
Line 59: | Line 66: | ||
* boolean (0,1) | * boolean (0,1) | ||
* base64 | * base64 | ||
Lets call this 'binary' as the encoding is in the data: url, and DRY applies | ** Lets call this 'binary' as the encoding is in the data: url, and DRY applies | ||
** RFC 2426 uses "B", which, when lowercased per microformats [[naming-principles]] is 'b'. -Tantek | |||
* dateTime[.iso8601] | * dateTime[.iso8601] | ||
Whlle not perfect, these certainly cover the 80% case, and are reasonably well-defined. That said, there are a number of open questions about how to use them: | Whlle not perfect, these certainly cover the 80% case, and are reasonably well-defined. That said, there are a number of open questions about how to use them: | ||
# should 'string' also be explicitly specified, or can it be assumed? | # should 'string' also be explicitly specified, or can it be assumed? | ||
Assumed, and also defined as utf-8. [[User:Kevin Marks|Kevin Marks]] 16:39, 13 Feb 2006 (PST) | #*Assumed, and also defined as utf-8. [[User:Kevin Marks|Kevin Marks]] 16:39, 13 Feb 2006 (PST) | ||
#*Agreed with Kevin. 'string' should be the default if no type is specified. Publishers MAY explicitly specify 'string'. | |||
# does 'int' always mean 32-bits? | # does 'int' always mean 32-bits? | ||
## If so, what should be used for 64-bit integers or cryptographic (256-bit+) numbers? | ## If so, what should be used for 64-bit integers or cryptographic (256-bit+) numbers? | ||
Line 72: | Line 81: | ||
###SQL's "decimal", perhaps? | ###SQL's "decimal", perhaps? | ||
## If not, how should conforming implementations react to longer integers than they can handle? | ## If not, how should conforming implementations react to longer integers than they can handle? | ||
I think integer is fine - we don't have an explict constraint here. Do you want to define +Inf -Inf and NaN behaviour? Certainly when building testcases and examples include these. | ##*I think integer is fine - we don't have an explict constraint here. Do you want to define +Inf -Inf and NaN behaviour? Certainly when building testcases and examples include these. | ||
# Is it worth deviating from the standard to allow "dateTime" as an alias? (the one case where XML Schema is actually simpler) | # Is it worth deviating from the standard to allow "dateTime" as an alias? (the one case where XML Schema is actually simpler) | ||
#* See comments below regarding date-time. | |||
== Proposal == | == Proposal == | ||
Line 81: | Line 91: | ||
* the name 'long' MAY be used for 64-bit or longer integers | * the name 'long' MAY be used for 64-bit or longer integers | ||
* for 'dateTime' | * for 'dateTime' | ||
can we make this 'datetime' ? [[User:Kevin Marks|Kevin Marks]] 16:39, 13 Feb 2006 (PST) | ** can we make this 'datetime' ? [[User:Kevin Marks|Kevin Marks]] 16:39, 13 Feb 2006 (PST) | ||
** the trailing '.iso8601' MUST be omitted, as '.' is not (always?) valid in CSS class | ** microformats don't use camel case. please see [[naming-principles]]. alternatives (Tantek): | ||
*** date-time (if you consider it to be two words) | |||
*** datetime (as proposed by Kevin, if you think it is one word) | |||
*** dt (reusing the common prefix shared by existing microformat class names: dtstart, dtend, dtreviewed from [[hcalendar|hCalendar]] and [[hreview|hReview]]. We could also make that a general rule for microformat class names for properties which take ISO8601 datetimes. (Tantek) | |||
** the trailing '.iso8601' MUST be omitted, as '.' is not (always?) valid in HTML class names | |||
*** '.' is neither a valid HTML class name, nor a valid character (unescaped at least) in a CSS class selector. - Tantek | |||
** date/time formats SHOULD follow the [http://www.w3.org/TR/NOTE-datetime W3C profile] of [http://en.wikipedia.org/wiki/ISO_8601 ISO 8601] | ** date/time formats SHOULD follow the [http://www.w3.org/TR/NOTE-datetime W3C profile] of [http://en.wikipedia.org/wiki/ISO_8601 ISO 8601] | ||
** a more human-readable rendering may be used, with the ISO8601 value in an "abbr" | ** a more human-readable rendering may be used, with the ISO8601 value in an "abbr" | ||
* binary data SHOULD be encoded in a [http://en.wikipedia.org/wiki/Data:_URI_scheme data: URI], with an explicit [http://www.htmlhelp.com/reference/html40/special/a.html ContentType] and a human-readable description as the body of the anchor. | * binary data SHOULD be encoded in a [http://en.wikipedia.org/wiki/Data:_URI_scheme data: URI], with an explicit [http://www.htmlhelp.com/reference/html40/special/a.html ContentType] and a human-readable description as the body of the anchor. | ||
* if no datatype is specified, an implementation MAY either attempt to infer a datatype from the syntax of the value, or simply assert that the value is a string. Thus, conforming implementations SHOULD always explicitly label strings. | * if no datatype is specified, an implementation MAY either attempt to infer a datatype from the syntax of the value, or simply assert that the value is a string. Thus, conforming implementations SHOULD always explicitly label strings. | ||
Disagree - either we are labelling datatypes and thus labelling string is redundant, or we are trying to guess from syntax. If the latter this whole spec is unnecessary. [[User:Kevin Marks|Kevin Marks]] 16:39, 13 Feb 2006 (PST) | ** Disagree - either we are labelling datatypes and thus labelling string is redundant, or we are trying to guess from syntax. If the latter this whole spec is unnecessary. [[User:Kevin Marks|Kevin Marks]] 16:39, 13 Feb 2006 (PST) | ||
** Agreed with Kevin. Let's keep 'string' as the default, and allow explicit usage of it. -Tantek | |||
To indicate that a particular micforomat uses typed values, precede that microformat with the class name 'typed', as in: | |||
<pre><nowiki> | |||
<ol class="typed xoxo"> | |||
</nowiki></pre> | |||
== Example == | == Example == | ||
Line 118: | Line 137: | ||
<dt>data</dt><dd class="binary"><a href="data:;base64,sdcfo2JTiXE=" type="image/jpg">my image</a></dd> | <dt>data</dt><dd class="binary"><a href="data:;base64,sdcfo2JTiXE=" type="image/jpg">my image</a></dd> | ||
</dl> | </dl> | ||
'''Note:''' [[xoxo|XOXO]] always starts with either <code>ol</code> or <code>ul</code>. The <code>dl</code> in XOXO is always used to declare the properties of a specific <code>li</code>. These examples should be updated accordingly. -Tantek | |||
Line 125: | Line 146: | ||
* Original [http://homepage.mac.com/drernie/plist.html plist] datatype mapping proposal | * Original [http://homepage.mac.com/drernie/plist.html plist] datatype mapping proposal | ||
* Revised [http://opendarwin.org/~drernie/xoxo-datatypes.html xoxo datatype] proposal | * Revised [http://opendarwin.org/~drernie/xoxo-datatypes.html xoxo datatype] proposal | ||
== See Also == | |||
* [[xoxo]] | |||
* [[naming-principles]] |
Revision as of 04:25, 14 February 2006
Datatypes in HTML
One of the challenges of using HTML as a data transport is that everything, by default, is a string. This page explores ways to use microformats -- specifically, class names -- to encode data type information, e.g., for use with xoxo and rest/ahah, in order to allow lossless import/export from various languages. These could also be used with forms to provide rest/descriptions of the type of data expected.
Contributors
- Dr. Ernie Prabhakar
- Kevin Marks
- Tantek Çelik
Examples
These are the primary datatypes in a range of different languages and formats. Note that we are only concerned with "primitive" datatypes (loosely defined), as structured datatypes (list/array, hash/dictionary) are handled by xoxo.
Language/format | string | float | integer | boolean | data | date/time | null |
---|---|---|---|---|---|---|---|
XML Schema | string | float, double | decimal, integer, etc. | boolean | hexBinary, base64Binary | duration, dateTime, date, time | nil |
XML-RPC | string | double | i4, int | boolean | base64 | dateTime.iso8601 | nil |
Mac OS X plists | string | real | integer | true, false | data | date | nil |
JSON (JavaScript) | string | number | number | true, false | N/A | Date | nil |
YAML tags | str | int | float | bool | null (base 64) | N/A | null |
SQL (JDBC) | char,varchar | float, double, real | decimal, numeric | bit | binary | date, time, timestamp | ? |
C | char[] | float, double | int, long, short | bool, int | char[] | N/A | (void*)0 |
Java | char, String | float, double | int, long, short, byte | boolean | N/A | util.Date | null |
PHP | string | float (double) | integer | boolean | array | N/A | NULL |
Perl | array | scalar | scalar | scalar | array | N/A | |
Python | str | float, complex | int, long | bool | binascii, base64 | time,datetime | |
Ruby + lib | String | Float | Fixnum, Bignum | TrueClass,FalseClass | Hash | Date | NilClass |
REBOL | string! | decimal! | integer! | logic! | binary! | date!, time! | none! |
Analysis
The most common set of datatypes appears to be those represented by XML-RPC, which (perhaps fortunately) also has historical precedence on the web:
- string
- double
- int [i4] - 4-byte integer (32-bit)
- boolean (0,1)
- base64
- Lets call this 'binary' as the encoding is in the data: url, and DRY applies
- RFC 2426 uses "B", which, when lowercased per microformats naming-principles is 'b'. -Tantek
- dateTime[.iso8601]
Whlle not perfect, these certainly cover the 80% case, and are reasonably well-defined. That said, there are a number of open questions about how to use them:
- should 'string' also be explicitly specified, or can it be assumed?
- Assumed, and also defined as utf-8. Kevin Marks 16:39, 13 Feb 2006 (PST)
- Agreed with Kevin. 'string' should be the default if no type is specified. Publishers MAY explicitly specify 'string'.
- does 'int' always mean 32-bits?
- If so, what should be used for 64-bit integers or cryptographic (256-bit+) numbers?
- Python's 'long' is simple, but ambiguous.
- Ruby's BigNum is clear but much less common.
- XML-Schema has so many types it is hard to say.
- SQL's "decimal", perhaps?
- If not, how should conforming implementations react to longer integers than they can handle?
- I think integer is fine - we don't have an explict constraint here. Do you want to define +Inf -Inf and NaN behaviour? Certainly when building testcases and examples include these.
- If so, what should be used for 64-bit integers or cryptographic (256-bit+) numbers?
- Is it worth deviating from the standard to allow "dateTime" as an alias? (the one case where XML Schema is actually simpler)
- See comments below regarding date-time.
Proposal
The proposal is to adopt XML-RPC scalar values as the class names for typed microformats, with the following caveats:
- the alias 'i4' for integer SHOULD not be used
- the name 'long' MAY be used for 64-bit or longer integers
- for 'dateTime'
- can we make this 'datetime' ? Kevin Marks 16:39, 13 Feb 2006 (PST)
- microformats don't use camel case. please see naming-principles. alternatives (Tantek):
- date-time (if you consider it to be two words)
- datetime (as proposed by Kevin, if you think it is one word)
- dt (reusing the common prefix shared by existing microformat class names: dtstart, dtend, dtreviewed from hCalendar and hReview. We could also make that a general rule for microformat class names for properties which take ISO8601 datetimes. (Tantek)
- the trailing '.iso8601' MUST be omitted, as '.' is not (always?) valid in HTML class names
- '.' is neither a valid HTML class name, nor a valid character (unescaped at least) in a CSS class selector. - Tantek
- date/time formats SHOULD follow the W3C profile of ISO 8601
- a more human-readable rendering may be used, with the ISO8601 value in an "abbr"
- binary data SHOULD be encoded in a data: URI, with an explicit ContentType and a human-readable description as the body of the anchor.
- if no datatype is specified, an implementation MAY either attempt to infer a datatype from the syntax of the value, or simply assert that the value is a string. Thus, conforming implementations SHOULD always explicitly label strings.
- Disagree - either we are labelling datatypes and thus labelling string is redundant, or we are trying to guess from syntax. If the latter this whole spec is unnecessary. Kevin Marks 16:39, 13 Feb 2006 (PST)
- Agreed with Kevin. Let's keep 'string' as the default, and allow explicit usage of it. -Tantek
To indicate that a particular micforomat uses typed values, precede that microformat with the class name 'typed', as in:
<ol class="typed xoxo">
Example
<dl class="typed xoxo"> <dt>key</dt><dd class="string">value</dd> <dt>integer</dt><dd class="int">137</dd> <dt>real</dt><dd class="double">3.14159265</dd> <dt>date</dt><dd class="dateTime">1994-11-05T13:15:30Z</dd> <dt>date(abbr)</dt><dd class="dateTime"><abbr title="1994-11-05">November 5, 1994</abbr></dd> <dt>true</dt><dd class="boolean">1</dd> <dt>false</dt><dd class="boolean">0</dd> <dt>data</dt><dd class="base64"><a href="data:;base64,sdcfo2JTiXE=" type="image/jpg">my image</a></dd> </dl>
Example revised with above suggestions:
<dl class="typed xoxo"> <dt>key</dt><dd>value</dd> <dt>integer</dt><dd class="int">137</dd> <dt>real</dt><dd class="double">3.14159265</dd> <dt>date</dt><dd class="datetime">1994-11-05T13:15:30Z</dd> <dt>date(abbr)</dt><dd class="datetime"><abbr title="1994-11-05">November 5, 1994</abbr></dd> <dt>true</dt><dd class="boolean">1</dd> <dt>false</dt><dd class="boolean">0</dd> <dt>data</dt><dd class="binary"><a href="data:;base64,sdcfo2JTiXE=" type="image/jpg">my image</a></dd> </dl>
Note: XOXO always starts with either ol
or ul
. The dl
in XOXO is always used to declare the properties of a specific li
. These examples should be updated accordingly. -Tantek
References
- Datatypes in Wikipedia
- Origional datatype discussion
- Original plist datatype mapping proposal
- Revised xoxo datatype proposal