From Microformats Wiki
Jump to navigation Jump to search

Datatypes in HTML

One of the challenges of using HTML as a data transport is that everything, by default, is a string. This page explores ways to use microformats -- specifically, class names -- to encode data type information, e.g., for use with XOXO 1.0: Extensible Open XHTML Outlines and rest/ahah, in order to allow lossless import/export from various languages. These could also be used with forms to provide rest/descriptions of the type of data expected.


These are the primary datatypes in a range of different languages and formats. Note that we are only concerned with "primitive" datatypes (loosely defined), as structured datatypes (list/array, hash/dictionary) are handled by XOXO 1.0: Extensible Open XHTML Outlines.

Datatype comparison table
Language/format string float integer boolean data date/time null
XML Schema string float, double decimal, integer, etc. boolean hexBinary, base64Binary duration, dateTime, date, time nil
XML-RPC string double i4, int boolean base64 dateTime.iso8601 nil
Mac OS X plists string real integer true, false data date nil
JSON (JavaScript) string number number true, false N/A Date nil
YAML tags str int float bool null (base 64) N/A null
SQL (JDBC) char,varchar float, double, real decimal, numeric bit binary date, time, timestamp null
C char[] float, double int, long, short bool, int char[] N/A (void*)0
Java char, String float, double int, long, short, byte boolean N/A util.Date ?
PHP string float (double) integer boolean array N/A NULL
Perl array scalar scalar scalar array N/A
Python str float, complex int, long bool binascii, base64 time,datetime
Ruby + lib String Float Fixnum, Bignum TrueClass,FalseClass Hash Date NilClass


The most common set of datatypes appears to be those represented by XML-RPC, which (perhaps fortunately) also has historical precedence on the web:

  • string
  • double
  • int [i4]
  • boolean
  • base64
  • dateTime[.iso8601]

Whlle not perfect, these certainly cover the 80% case, and are reasonably well-defined. That said, there are a number of open questions about how to use them:

  1. should 'string' also be explicitly specified, or can it be assumed?
  2. does 'int' always mean 32-bits?
    1. If so, what should be used for 64-bit integers or cryptographic (256-bit+) numbers?
      1. Python's 'long' is simple, but ambiguous.
      2. Ruby's BigNum is clear but much less common.
      3. XML-Schema has so many types it is hard to say.
      4. SQL's "decimal", perhaps?
    2. If not, how should conforming implementations react to longer integers than they can handle?
  3. Is it worth deviating from the standard to allow "dateTime" as an alias? (the one case where XML Schema is actually simpler)


  • TBD