rest/datatypes: Difference between revisions

From Microformats Wiki
Jump to navigation Jump to search
No edit summary
 
m (en-US)
 
(47 intermediate revisions by 7 users not shown)
Line 1: Line 1:
= Datatypes in HTML =
<h1> Datatypes in HTML </h1>
One of the challenges of using HTML as a data transport is that everything, by default, is a string.  This page explores ways to use microformats -- specifically, class names -- to encode data type information, e.g., for use with [[xoxo]] and [[rest/ahah]].
One of the challenges of using HTML as a data transport is that everything, by default, is a string.  This page explores ways to use microformats -- specifically, class names -- to encode data type information, e.g., for use with [[xoxo]] and [[rest/ahah]], in order to allow lossless import/export from various languages. These could also be used with forms to provide [[rest/description]]s of the type of data expected.
 
__TOC__
 
== Contributors ==
* Dr. Ernie Prabhakar
* Chris RG
* Mark Rickerby
* Robert Bachmann
* Kevin Marks
* Tantek Çelik


== Examples ==
== Examples ==
These are the primary datatypes in a range of different languages and formats.  Note that we are only concerned with "primitive" datatypes (loosely defined), as structured datatypes (list/array, hash/dictionary) are handled by [[xoxo]].


{| border="1" cellpadding="2"
{| border="1" cellpadding="2"
|+Datatype usage in various languages
|+Datatype comparison table
|-
|-
! Language/format !! string !! float !!  integer !! boolean !! data || date/time
! Language/format !! string !! float !!  integer !! boolean !! data || date/time || null
|-
|-
! [http://www.w3.org/TR/xmlschema-2/#built-in-datatypes XML Schema]
! [http://www.w3.org/TR/xmlschema-2/#built-in-datatypes XML Schema]
| string || float, double || decimal -> integer -> etc. ||  boolean || hexBinary, base64Binary || duration, dateTime, date, time
| string || float, double || decimal, integer, etc. ||  boolean || hexBinary, base64Binary || duration, dateTime, date, time || nil
|-
! [http://ws.apache.org/xmlrpc/types.html XML-RPC]
| string || double || i4, int||  boolean || base64 || dateTime.iso8601 || nil
|-
|-
! [http://developer.apple.com/documentation/Cocoa/Conceptual/PropertyLists/Concepts/XMLPListsConcept.html Mac OS X plists]
! [http://developer.apple.com/documentation/Cocoa/Conceptual/PropertyLists/Concepts/XMLPListsConcept.html Mac OS X plists]
| string || real || integer || true, false || data ||  date
| string || real || integer || true, false || data ||  date || nil
|-
! [http://www.crockford.com/JSON JSON] (JavaScript)
| string || number || number ||  true, false || N/A || Date || nil
|-
! [http://yaml.org/spec/current.html#id2503753 YAML] tags
| str || int || float ||  bool || null (base 64) || N/A || null
|-
! [http://java.sun.com/j2se/1.3/docs/guide/jdbc/getstart/mapping.html#table1 SQL (JDBC)]
| char,varchar || float, double, real || decimal, numeric || bit || binary || date, time, timestamp || ?
|-
! [http://www.sysprog.net/ctype.html C]
| char[] || float, double || int, long, short ||  bool, int || char[] || N/A || (void*)0
|-
! [http://java.sun.com/docs/books/tutorial/java/nutsandbolts/datatypes.html Java]
| char, String || float, double || int, long, short, byte ||  boolean || N/A || util.Date || null
|-
! [http://www.zend.com/manual/language.types.php  PHP]
| string || float (double)|| integer ||  boolean || array || N/A || NULL
|-
! [http://search.cpan.org/dist/perl/pod/perldata.pod  Perl]
| array || scalar || scalar ||  scalar || array || N/A
|-
! [http://en.wikibooks.org/wiki/Programming:Python_Numbers  Python]
| str || float, complex || int, long ||  bool || binascii, base64 || time,datetime
|-
! [http://www.rubycentral.com/book/ext_ruby.html  Ruby] + [http://www.rubycentral.com/book/lib_standard.html lib]
| String || Float || Fixnum, Bignum ||  TrueClass,FalseClass || Hash || Date || NilClass
|-
! [http://www.rebol.com/docs/core23/rebolcore-16.html  REBOL]
| string! || decimal! || integer! || logic! || binary! || date!, time! || none!
|-
|-
|}
|}


== Proposals ==
== Analysis ==
* TBD
The most common set of datatypes appears to be those represented by XML-RPC, which (perhaps fortunately) also has historical precedence on the web:
* string
* double
* int [i4] - 4-byte integer (32-bit)
* boolean (0,1)
* base64
** Lets call this 'binary' as the encoding is in the data: url, and DRY applies
** RFC 2426 uses "B", which, when lowercased per microformats [[naming-principles]] is 'b'. -Tantek
* dateTime[.iso8601]
 
While not perfect, these certainly cover the 80% case, and are reasonably well-defined.  That said, there are a number of open questions about how to use them:
# should 'string' also be explicitly specified, or can it be assumed?
#*Assumed, and also defined as utf-8. [[User:Kevin Marks|Kevin Marks]] 16:39, 13 Feb 2006 (PST)
#*Agreed with Kevin. 'string' should be the default if no type is specified.  Publishers MAY explicitly specify 'string'. - Tantek
#*Shouldn't the encoding be that of the page the markup is found on (as specified in the HTTP and HTML specs), rather than defined as utf-8? [[User:Jim Ancona|Jim Ancona]]
#*Jim, that's a good point, the encoding should be determined by the rules of the containing document ((X)HTML) and protocol (HTTP). - Tantek
# does 'int' always mean 32-bits?
##  If so, what should be used for 64-bit integers or cryptographic (256-bit+) numbers?
###Python's 'long' is simple, but ambiguous.
###Ruby's BigNum is clear but much less common.
###XML-Schema has so many types it is hard to say.
###* In this case, XML-Schema makes the distinction that 'int' represents a standard 32 bit integer, while 'integer' represents a signed integer of arbitrary length.
###SQL's "decimal", perhaps?
##  If not, how should conforming implementations react to longer integers than they can handle?
##*I think integer is fine - we don't have an explict constraint here. Do you want to define +Inf -Inf and NaN behavior? Certainly when building testcases and examples include these.
 
# Is it worth deviating from the standard to allow "dateTime" as an alias? (the one case where XML Schema is actually simpler)
#* See comments below regarding date-time.
 
== Proposal ==
The proposal is to adopt [http://www.xmlrpc.com/spec/ XML-RPC] scalar values as the class names for typed microformats, with the following caveats:
=== Integers ===
* the alias ''i4'' for integer SHOULD NOT be used
* the name ''long'' MAY be used for 64-bit or longer integers
* thus, the name ''int'' MAY be used for more than 32-bit signed integers
 
=== Date and Time ===
* use ''datetime'' for ''dateTime.iso8601''
** camelCase is not appropriate according to microformat [[naming-principles]].
** '.' is neither a valid HTML class name, nor a valid character (unescaped at least) in a CSS class selector
** Alternative: dt (reusing the common prefix shared by existing microformat class names: dtstart, dtend, dtreviewed from [[hcalendar|hCalendar]] and [[hreview|hReview]].  We could also make that a [http://microformats.org/wiki/naming-principles#dt_properties general rule for microformat class names for properties which take ISO8601 datetimes]. (Tantek)
* date/time formats SHOULD follow the [http://www.w3.org/TR/NOTE-datetime W3C profile]
** at any rate, they MUST follow [http://en.wikipedia.org/wiki/ISO_8601 ISO 8601]
** a more human-readable rendering may be used, with the ISO8601 value in an ''abbr''
=== Binary Data ===
* binary data SHOULD be encoded in a [http://en.wikipedia.org/wiki/Data:_URI_scheme data: URI], with an explicit [http://www.htmlhelp.com/reference/html40/special/a.html ContentType] and a human-readable description as the body of the anchor.
* therefore, use ''binary'' for ''base64'', as there may be alternate, non-base64 encodings in the future
 
=== String ===
* ''string'' MAY be omittted
* thus, any unlabeled entries MUST be interpreted as strings.
 
== Usage ==
 
To indicate that a particular micforomat uses typed values, precede or follow that microformat with the class name ''typed'', as in:
 
<pre><nowiki>
<ol class=''typed xoxo''>
</nowiki></pre>
 
or
 
<pre><nowiki>
<ol class=''xoxo typed''>
</nowiki></pre>
 
In other words, this defines what might be called the "<pre>typed</pre>" microformat.
 
== Summary ==
* string (optional)
* boolean (0,1)
* int (WAS i4; MAY use long)
* double
* datetime (WAS dateTime.iso8601)
* binary (WAS base64)
* nil
 
== Example ==
&lt;ol class="typed xoxo"> # every XOXO must begin with ol or ul
  &lt;li> 
  &lt;dl> 
  &lt;dt>key&lt;/dt>&lt;dd>value&lt;/dd>
  &lt;dt>integer&lt;/dt>&lt;dd class="int">137&lt;/dd>
  &lt;dt>real&lt;/dt>&lt;dd class="double">3.14159265&lt;/dd>
  &lt;dt>date&lt;/dt>&lt;dd class="datetime">1994-11-05T13:15:30Z&lt;/dd>
  &lt;dt>date(abbr)&lt;/dt>&lt;dd class="datetime">&lt;abbr title="1994-11-05">November 5, 1994&lt;/abbr>&lt;/dd>
  &lt;dt>true&lt;/dt>&lt;dd class="boolean">1&lt;/dd>
  &lt;dt>false&lt;/dt>&lt;dd class="boolean">0&lt;/dd>
  &lt;dt>data&lt;/dt>&lt;dd class="binary">&lt;a href="data:;base64,sdcfo2JTiXE=" type="image/jpg">my image&lt;/a>&lt;/dd>
  &lt;/dl>
  &lt;/li>
&lt;/ol>


== References ==
== References ==
* [http://www.yaml.org/ YAML] format
* [http://www.crockford.com/JSON/ JSON] (JavaScript)
* [http://en.wikipedia.org/wiki/Datatype Datatypes] in Wikipedia
* [http://en.wikipedia.org/wiki/Datatype Datatypes] in Wikipedia
* Origional datatype [http://microformats.org/discuss/mail/microformats-discuss/2005-September/001020.html]
* Origional [http://microformats.org/discuss/mail/microformats-discuss/2005-September/001020.html datatype] discussion
* Original [ plist] proposal
* Original [http://homepage.mac.com/drernie/plist.html plist] datatype mapping proposal
* Revised [http://opendarwin.org/~drernie/xoxo-datatypes.html xoxo datatype] proposal
* HTML 5 [http://hsivonen.iki.fi/html5-datatypes/ datatypes]
 
== See Also ==
* [[xoxo]]
* [[naming-principles]]

Latest revision as of 22:50, 31 August 2007

Datatypes in HTML

One of the challenges of using HTML as a data transport is that everything, by default, is a string. This page explores ways to use microformats -- specifically, class names -- to encode data type information, e.g., for use with xoxo and rest/ahah, in order to allow lossless import/export from various languages. These could also be used with forms to provide rest/descriptions of the type of data expected.

Contributors

  • Dr. Ernie Prabhakar
  • Chris RG
  • Mark Rickerby
  • Robert Bachmann
  • Kevin Marks
  • Tantek Çelik

Examples

These are the primary datatypes in a range of different languages and formats. Note that we are only concerned with "primitive" datatypes (loosely defined), as structured datatypes (list/array, hash/dictionary) are handled by xoxo.

Datatype comparison table
Language/format string float integer boolean data date/time null
XML Schema string float, double decimal, integer, etc. boolean hexBinary, base64Binary duration, dateTime, date, time nil
XML-RPC string double i4, int boolean base64 dateTime.iso8601 nil
Mac OS X plists string real integer true, false data date nil
JSON (JavaScript) string number number true, false N/A Date nil
YAML tags str int float bool null (base 64) N/A null
SQL (JDBC) char,varchar float, double, real decimal, numeric bit binary date, time, timestamp ?
C char[] float, double int, long, short bool, int char[] N/A (void*)0
Java char, String float, double int, long, short, byte boolean N/A util.Date null
PHP string float (double) integer boolean array N/A NULL
Perl array scalar scalar scalar array N/A
Python str float, complex int, long bool binascii, base64 time,datetime
Ruby + lib String Float Fixnum, Bignum TrueClass,FalseClass Hash Date NilClass
REBOL string! decimal! integer! logic! binary! date!, time! none!

Analysis

The most common set of datatypes appears to be those represented by XML-RPC, which (perhaps fortunately) also has historical precedence on the web:

  • string
  • double
  • int [i4] - 4-byte integer (32-bit)
  • boolean (0,1)
  • base64
    • Lets call this 'binary' as the encoding is in the data: url, and DRY applies
    • RFC 2426 uses "B", which, when lowercased per microformats naming-principles is 'b'. -Tantek
  • dateTime[.iso8601]

While not perfect, these certainly cover the 80% case, and are reasonably well-defined. That said, there are a number of open questions about how to use them:

  1. should 'string' also be explicitly specified, or can it be assumed?
    • Assumed, and also defined as utf-8. Kevin Marks 16:39, 13 Feb 2006 (PST)
    • Agreed with Kevin. 'string' should be the default if no type is specified. Publishers MAY explicitly specify 'string'. - Tantek
    • Shouldn't the encoding be that of the page the markup is found on (as specified in the HTTP and HTML specs), rather than defined as utf-8? Jim Ancona
    • Jim, that's a good point, the encoding should be determined by the rules of the containing document ((X)HTML) and protocol (HTTP). - Tantek
  2. does 'int' always mean 32-bits?
    1. If so, what should be used for 64-bit integers or cryptographic (256-bit+) numbers?
      1. Python's 'long' is simple, but ambiguous.
      2. Ruby's BigNum is clear but much less common.
      3. XML-Schema has so many types it is hard to say.
        • In this case, XML-Schema makes the distinction that 'int' represents a standard 32 bit integer, while 'integer' represents a signed integer of arbitrary length.
      4. SQL's "decimal", perhaps?
    2. If not, how should conforming implementations react to longer integers than they can handle?
      • I think integer is fine - we don't have an explict constraint here. Do you want to define +Inf -Inf and NaN behavior? Certainly when building testcases and examples include these.
  1. Is it worth deviating from the standard to allow "dateTime" as an alias? (the one case where XML Schema is actually simpler)
    • See comments below regarding date-time.

Proposal

The proposal is to adopt XML-RPC scalar values as the class names for typed microformats, with the following caveats:

Integers

  • the alias i4 for integer SHOULD NOT be used
  • the name long MAY be used for 64-bit or longer integers
  • thus, the name int MAY be used for more than 32-bit signed integers

Date and Time

Binary Data

  • binary data SHOULD be encoded in a data: URI, with an explicit ContentType and a human-readable description as the body of the anchor.
  • therefore, use binary for base64, as there may be alternate, non-base64 encodings in the future

String

  • string MAY be omittted
  • thus, any unlabeled entries MUST be interpreted as strings.

Usage

To indicate that a particular micforomat uses typed values, precede or follow that microformat with the class name typed, as in:

 <ol class=''typed xoxo''>

or

 <ol class=''xoxo typed''>

In other words, this defines what might be called the "

typed

" microformat.

Summary

  • string (optional)
  • boolean (0,1)
  • int (WAS i4; MAY use long)
  • double
  • datetime (WAS dateTime.iso8601)
  • binary (WAS base64)
  • nil

Example

<ol class="typed xoxo"> # every XOXO must begin with ol or ul
 <li>   
 <dl>  
  <dt>key</dt><dd>value</dd>
  <dt>integer</dt><dd class="int">137</dd>
  <dt>real</dt><dd class="double">3.14159265</dd>
  <dt>date</dt><dd class="datetime">1994-11-05T13:15:30Z</dd>
  <dt>date(abbr)</dt><dd class="datetime"><abbr title="1994-11-05">November 5, 1994</abbr></dd>
  <dt>true</dt><dd class="boolean">1</dd>
  <dt>false</dt><dd class="boolean">0</dd>
  <dt>data</dt><dd class="binary"><a href="data:;base64,sdcfo2JTiXE=" type="image/jpg">my image</a></dd>
 </dl>
 </li>
</ol>

References

See Also