rest/datatypes: Difference between revisions
m (en-US)  | 
				|||
| (22 intermediate revisions by 5 users not shown) | |||
| Line 1: | Line 1: | ||
<h1> Datatypes in HTML </h1>  | |||
One of the challenges of using HTML as a data transport is that everything, by default, is a string.  This page explores ways to use microformats -- specifically, class names -- to encode data type information, e.g., for use with [[xoxo]] and [[rest/ahah]], in order to allow lossless import/export from various languages.  These could also be used with forms to provide [[rest/description]]s of the type of data expected.  | One of the challenges of using HTML as a data transport is that everything, by default, is a string.  This page explores ways to use microformats -- specifically, class names -- to encode data type information, e.g., for use with [[xoxo]] and [[rest/ahah]], in order to allow lossless import/export from various languages.  These could also be used with forms to provide [[rest/description]]s of the type of data expected.  | ||
__TOC__  | |||
== Contributors ==  | |||
* Dr. Ernie Prabhakar  | |||
* Chris RG  | |||
* Mark Rickerby  | |||
* Robert Bachmann  | |||
* Kevin Marks  | |||
* Tantek Çelik  | |||
== Examples ==  | == Examples ==  | ||
| Line 59: | Line 69: | ||
* boolean (0,1)  | * boolean (0,1)  | ||
* base64  | * base64  | ||
** Lets call this 'binary' as the encoding is in the data: url, and DRY applies  | |||
** RFC 2426 uses "B", which, when lowercased per microformats [[naming-principles]] is 'b'. -Tantek  | |||
* dateTime[.iso8601]  | * dateTime[.iso8601]  | ||
While not perfect, these certainly cover the 80% case, and are reasonably well-defined.  That said, there are a number of open questions about how to use them:  | |||
# should 'string' also be explicitly specified, or can it be assumed?  | # should 'string' also be explicitly specified, or can it be assumed?    | ||
#*Assumed, and also defined as utf-8. [[User:Kevin Marks|Kevin Marks]] 16:39, 13 Feb 2006 (PST)   | |||
#*Agreed with Kevin. 'string' should be the default if no type is specified.  Publishers MAY explicitly specify 'string'. - Tantek  | |||
#*Shouldn't the encoding be that of the page the markup is found on (as specified in the HTTP and HTML specs), rather than defined as utf-8? [[User:Jim Ancona|Jim Ancona]]  | |||
#*Jim, that's a good point, the encoding should be determined by the rules of the containing document ((X)HTML) and protocol (HTTP). - Tantek  | |||
# does 'int' always mean 32-bits?  | # does 'int' always mean 32-bits?  | ||
##  If so, what should be used for 64-bit integers or cryptographic (256-bit+) numbers?    | ##  If so, what should be used for 64-bit integers or cryptographic (256-bit+) numbers?    | ||
| Line 68: | Line 84: | ||
###Ruby's BigNum is clear but much less common.  | ###Ruby's BigNum is clear but much less common.  | ||
###XML-Schema has so many types it is hard to say.  | ###XML-Schema has so many types it is hard to say.  | ||
###* In this case, XML-Schema makes the distinction that 'int' represents a standard 32 bit integer, while 'integer' represents a signed integer of arbitrary length.  | |||
###SQL's "decimal", perhaps?  | ###SQL's "decimal", perhaps?  | ||
##  If not, how should conforming implementations react to longer integers than they can handle?  | ##  If not, how should conforming implementations react to longer integers than they can handle?  | ||
##*I think integer is fine - we don't have an explict constraint here. Do you want to define +Inf -Inf and NaN behavior? Certainly when building testcases and examples include these.   | |||
# Is it worth deviating from the standard to allow "dateTime" as an alias? (the one case where XML Schema is actually simpler)  | # Is it worth deviating from the standard to allow "dateTime" as an alias? (the one case where XML Schema is actually simpler)  | ||
#* See comments below regarding date-time.  | |||
==   | == Proposal ==  | ||
The proposal is to adopt [http://www.xmlrpc.com/spec/ XML-RPC] scalar values as the class names for typed microformats, with the following caveats:  | The proposal is to adopt [http://www.xmlrpc.com/spec/ XML-RPC] scalar values as the class names for typed microformats, with the following caveats:  | ||
* the alias 'i4' for integer SHOULD   | === Integers ===  | ||
* the name 'long' MAY be used for 64-bit or longer integers  | * the alias ''i4'' for integer SHOULD NOT be used  | ||
* for 'dateTime'  | * the name ''long'' MAY be used for 64-bit or longer integers  | ||
**   | * thus, the name ''int'' MAY be used for more than 32-bit signed integers  | ||
*   | === Date and Time ===  | ||
* use ''datetime'' for ''dateTime.iso8601''  | |||
** camelCase is not appropriate according to microformat [[naming-principles]].  | |||
** '.' is neither a valid HTML class name, nor a valid character (unescaped at least) in a CSS class selector  | |||
** Alternative: dt (reusing the common prefix shared by existing microformat class names: dtstart, dtend, dtreviewed from [[hcalendar|hCalendar]] and [[hreview|hReview]].  We could also make that a [http://microformats.org/wiki/naming-principles#dt_properties general rule for microformat class names for properties which take ISO8601 datetimes]. (Tantek)  | |||
* date/time formats SHOULD follow the [http://www.w3.org/TR/NOTE-datetime W3C profile]  | |||
** at any rate, they MUST follow [http://en.wikipedia.org/wiki/ISO_8601 ISO 8601]  | |||
** a more human-readable rendering may be used, with the ISO8601 value in an ''abbr''  | |||
=== Binary Data ===  | |||
* binary data SHOULD be encoded in a [http://en.wikipedia.org/wiki/Data:_URI_scheme data: URI], with an explicit [http://www.htmlhelp.com/reference/html40/special/a.html ContentType] and a human-readable description as the body of the anchor.  | |||
* therefore, use ''binary'' for ''base64'', as there may be alternate, non-base64 encodings in the future  | |||
=== String ===   | |||
* ''string'' MAY be omittted  | |||
* thus, any unlabeled entries MUST be interpreted as strings.  | |||
== Usage ==  | |||
To indicate that a particular micforomat uses typed values, precede or follow that microformat with the class name ''typed'', as in:  | |||
<pre><nowiki>  | |||
  <   |  <ol class=''typed xoxo''>  | ||
</nowiki></pre>  | |||
or  | |||
<pre><nowiki>  | |||
 <ol class=''xoxo typed''>  | |||
</nowiki></pre>  | |||
In other words, this defines what might be called the "<pre>typed</pre>" microformat.  | |||
== Summary ==  | |||
* string (optional)  | |||
* boolean (0,1)  | |||
* int (WAS i4; MAY use long)  | |||
* double  | |||
* datetime (WAS dateTime.iso8601)  | |||
* binary (WAS base64)  | |||
* nil  | |||
== Example ==  | |||
  <ol class="typed xoxo"> # every XOXO must begin with ol or ul  | |||
  <li>     | |||
  <dl>    | |||
   <dt>key</dt><dd>value</dd>  | |||
   <dt>integer</dt><dd class="int">137</dd>  | |||
   <dt>real</dt><dd class="double">3.14159265</dd>  | |||
   <dt>date</dt><dd class="datetime">1994-11-05T13:15:30Z</dd>  | |||
   <dt>date(abbr)</dt><dd class="datetime"><abbr title="1994-11-05">November 5, 1994</abbr></dd>  | |||
   <dt>true</dt><dd class="boolean">1</dd>  | |||
   <dt>false</dt><dd class="boolean">0</dd>  | |||
   <dt>data</dt><dd class="binary"><a href="data:;base64,sdcfo2JTiXE=" type="image/jpg">my image</a></dd>  | |||
  </dl>  | |||
  </li>  | |||
 </ol>  | |||
== References ==  | == References ==  | ||
| Line 89: | Line 161: | ||
* Original [http://homepage.mac.com/drernie/plist.html plist] datatype mapping proposal  | * Original [http://homepage.mac.com/drernie/plist.html plist] datatype mapping proposal  | ||
* Revised [http://opendarwin.org/~drernie/xoxo-datatypes.html xoxo datatype] proposal  | * Revised [http://opendarwin.org/~drernie/xoxo-datatypes.html xoxo datatype] proposal  | ||
* HTML 5 [http://hsivonen.iki.fi/html5-datatypes/ datatypes]  | |||
== See Also ==  | |||
* [[xoxo]]  | |||
* [[naming-principles]]  | |||
Latest revision as of 22:50, 31 August 2007
Datatypes in HTML
One of the challenges of using HTML as a data transport is that everything, by default, is a string. This page explores ways to use microformats -- specifically, class names -- to encode data type information, e.g., for use with xoxo and rest/ahah, in order to allow lossless import/export from various languages. These could also be used with forms to provide rest/descriptions of the type of data expected.
Contributors
- Dr. Ernie Prabhakar
 - Chris RG
 - Mark Rickerby
 - Robert Bachmann
 - Kevin Marks
 - Tantek Çelik
 
Examples
These are the primary datatypes in a range of different languages and formats. Note that we are only concerned with "primitive" datatypes (loosely defined), as structured datatypes (list/array, hash/dictionary) are handled by xoxo.
| Language/format | string | float | integer | boolean | data | date/time | null | 
|---|---|---|---|---|---|---|---|
| XML Schema | string | float, double | decimal, integer, etc. | boolean | hexBinary, base64Binary | duration, dateTime, date, time | nil | 
| XML-RPC | string | double | i4, int | boolean | base64 | dateTime.iso8601 | nil | 
| Mac OS X plists | string | real | integer | true, false | data | date | nil | 
| JSON (JavaScript) | string | number | number | true, false | N/A | Date | nil | 
| YAML tags | str | int | float | bool | null (base 64) | N/A | null | 
| SQL (JDBC) | char,varchar | float, double, real | decimal, numeric | bit | binary | date, time, timestamp | ? | 
| C | char[] | float, double | int, long, short | bool, int | char[] | N/A | (void*)0 | 
| Java | char, String | float, double | int, long, short, byte | boolean | N/A | util.Date | null | 
| PHP | string | float (double) | integer | boolean | array | N/A | NULL | 
| Perl | array | scalar | scalar | scalar | array | N/A | |
| Python | str | float, complex | int, long | bool | binascii, base64 | time,datetime | |
| Ruby + lib | String | Float | Fixnum, Bignum | TrueClass,FalseClass | Hash | Date | NilClass | 
| REBOL | string! | decimal! | integer! | logic! | binary! | date!, time! | none! | 
Analysis
The most common set of datatypes appears to be those represented by XML-RPC, which (perhaps fortunately) also has historical precedence on the web:
- string
 - double
 - int [i4] - 4-byte integer (32-bit)
 - boolean (0,1)
 - base64
- Lets call this 'binary' as the encoding is in the data: url, and DRY applies
 - RFC 2426 uses "B", which, when lowercased per microformats naming-principles is 'b'. -Tantek
 
 - dateTime[.iso8601]
 
While not perfect, these certainly cover the 80% case, and are reasonably well-defined. That said, there are a number of open questions about how to use them:
- should 'string' also be explicitly specified, or can it be assumed?
- Assumed, and also defined as utf-8. Kevin Marks 16:39, 13 Feb 2006 (PST)
 - Agreed with Kevin. 'string' should be the default if no type is specified. Publishers MAY explicitly specify 'string'. - Tantek
 - Shouldn't the encoding be that of the page the markup is found on (as specified in the HTTP and HTML specs), rather than defined as utf-8? Jim Ancona
 - Jim, that's a good point, the encoding should be determined by the rules of the containing document ((X)HTML) and protocol (HTTP). - Tantek
 
 - does 'int' always mean 32-bits?
- If so, what should be used for 64-bit integers or cryptographic (256-bit+) numbers?
- Python's 'long' is simple, but ambiguous.
 - Ruby's BigNum is clear but much less common.
 - XML-Schema has so many types it is hard to say.
- In this case, XML-Schema makes the distinction that 'int' represents a standard 32 bit integer, while 'integer' represents a signed integer of arbitrary length.
 
 - SQL's "decimal", perhaps?
 
 - If not, how should conforming implementations react to longer integers than they can handle?
- I think integer is fine - we don't have an explict constraint here. Do you want to define +Inf -Inf and NaN behavior? Certainly when building testcases and examples include these.
 
 
 - If so, what should be used for 64-bit integers or cryptographic (256-bit+) numbers?
 
- Is it worth deviating from the standard to allow "dateTime" as an alias? (the one case where XML Schema is actually simpler)
- See comments below regarding date-time.
 
 
Proposal
The proposal is to adopt XML-RPC scalar values as the class names for typed microformats, with the following caveats:
Integers
- the alias i4 for integer SHOULD NOT be used
 - the name long MAY be used for 64-bit or longer integers
 - thus, the name int MAY be used for more than 32-bit signed integers
 
Date and Time
- use datetime for dateTime.iso8601
- camelCase is not appropriate according to microformat naming-principles.
 - '.' is neither a valid HTML class name, nor a valid character (unescaped at least) in a CSS class selector
 - Alternative: dt (reusing the common prefix shared by existing microformat class names: dtstart, dtend, dtreviewed from hCalendar and hReview. We could also make that a general rule for microformat class names for properties which take ISO8601 datetimes. (Tantek)
 
 - date/time formats SHOULD follow the W3C profile
- at any rate, they MUST follow ISO 8601
 - a more human-readable rendering may be used, with the ISO8601 value in an abbr
 
 
Binary Data
- binary data SHOULD be encoded in a data: URI, with an explicit ContentType and a human-readable description as the body of the anchor.
 - therefore, use binary for base64, as there may be alternate, non-base64 encodings in the future
 
String
- string MAY be omittted
 - thus, any unlabeled entries MUST be interpreted as strings.
 
Usage
To indicate that a particular micforomat uses typed values, precede or follow that microformat with the class name typed, as in:
<ol class=''typed xoxo''>
or
<ol class=''xoxo typed''>
In other words, this defines what might be called the "
typed
" microformat.
Summary
- string (optional)
 - boolean (0,1)
 - int (WAS i4; MAY use long)
 - double
 - datetime (WAS dateTime.iso8601)
 - binary (WAS base64)
 - nil
 
Example
<ol class="typed xoxo"> # every XOXO must begin with ol or ul <li> <dl> <dt>key</dt><dd>value</dd> <dt>integer</dt><dd class="int">137</dd> <dt>real</dt><dd class="double">3.14159265</dd> <dt>date</dt><dd class="datetime">1994-11-05T13:15:30Z</dd> <dt>date(abbr)</dt><dd class="datetime"><abbr title="1994-11-05">November 5, 1994</abbr></dd> <dt>true</dt><dd class="boolean">1</dd> <dt>false</dt><dd class="boolean">0</dd> <dt>data</dt><dd class="binary"><a href="data:;base64,sdcfo2JTiXE=" type="image/jpg">my image</a></dd> </dl> </li> </ol>
References
- Datatypes in Wikipedia
 - Origional datatype discussion
 - Original plist datatype mapping proposal
 - Revised xoxo datatype proposal
 - HTML 5 datatypes