datetime-design-pattern

From Microformats Wiki
Revision as of 19:17, 30 July 2008 by Tantek (talk | contribs) (followed up and answered several issues on date and time separation using value excerption, grouped related issues, separated out different issues.)
Jump to navigation Jump to search

Datetime Design Pattern

This is a page for exploring a datetime design pattern.

Purpose

Practical Need

  • This design pattern arose as a result of solving the practical need for human readable dates for hCalendar.

How to use it

  • enclose the human-friendly datetime that you want to make machine readable with <abbr>
  • as per the class-design-pattern, add the appropriate class attribute to the abbr element
  • add a title attribute to the abbr element with the machine readable ISO8601 datetime or date as the value

Current uses

The pattern, which is now available as part of hAtom, hCalendar, hCard and hReview, is:

<abbr class="foo" title="YYYY-MM-DDTHH:MM:SS+ZZ:ZZ">Date Time</abbr>

where foo is the semantic classname which is being applied to this date/time, the title of the <abbr> is an ISO 8601 date/time, with an appropriate level of specificity, and "Date Time" is a human-friendly representation of the same date/time.

An alternative, if you are using UTC-based timestamps, would be:

<abbr class="foo" title="YYYY-MM-DDTHH:MM:SSZ">Date Time</abbr>

with a single "Z" as per ISO 8601

Ruby: An easy way to get this format from a DateTime is this:

DateTime.now.to_s

Profile of ISO8601

Any microformat using the date-time-design pattern SHOULD use a profile of ISO8601. There are currently two widely used profiles which SHOULD be reused.

Accessibility issues

Note: Some accessibility issues have been raised([1]) with Datetime Design Pattern, and concerns that its use could breach WCAG accessibility guidelines, that are being addressed as part of the abbr-design-pattern-issues discussion. Possible change recommendations may follow after the accessibility testing is complete. The accessibility concerns are considerably lessened, even eliminated when using the date-design-pattern, a subset of the datetime-design-pattern.

Discussion

This pattern is likely to be highly resuable.

--User:RyanKing

Can this not be viewed as a microformat in itself?

--User:DimitriGlazkov

It could, but inventing a microformat for the sake of inventing a microformat is against the microformat principles. If there is a specific real world problem (and uses cases) that such an elemental microformat would solve, then it would be worth considering.

Until then it is best to keep the <abbr> datetime concept merely as a microformat design pattern, to be used in _actual_ microformats that have a demonstrated practical need.

-- Tantek

Excerpt from #microformats Aug 18th. Please edit!

Aug 18 15:16:14 <Tantek>	DanC, what do you think of RFC3339?
Aug 18 15:17:14 <Tantek>	ISO8601 subset
Aug 18 15:17:19 <DanC>	        Date and Time on the Internet: Timestamps http://www.ietf.org/rfc/rfc3339.txt
Aug 18 15:17:30 <DanC>	        Klyne is a good guy. I wonder if I talked with him about this.
Aug 18 15:17:32 <Tantek>	compat with W3C-NOTE-DATETIME
Aug 18 15:17:50 <Tantek>	compat with xsd:dateTime
Aug 18 15:17:57 <Tantek>	it's a strict intersection subset
Aug 18 15:17:59 <DanC>	        I consider W3C-NOTE-DATETIME obsoleted by XML Schema datatype-- yeah.. xsd:dateTime
Aug 18 15:18:32 <Tantek>	compare/contrast normatively using xsd:dateTime vs. RFC3339
Aug 18 15:18:41 <Tantek>	note: Atom 1.0 chose RFC3339
Aug 18 15:18:50 <Tantek>	i would like input from the microformats community on this
Aug 18 15:19:27 <DanC>	        in what context are you evaluating RFC 3339?
Aug 18 15:19:28 <jcgregorio>	http://bitworking.org/news/Date_Constructs_in_the_Atom_Syndication_Format
Aug 18 15:21:24 <DanC>	        which microformat is the question coming from, Tantek ?
Aug 18 15:23:31 <DanC>	        "   The grammar element time-second may have the value "60" at the end of
Aug 18 15:23:31 <DanC>	        months in which a leap second occurs" The XML Schema WG is in the 27th level of
                                leap-second-hell for the past few months, I gather.
Aug 18 15:24:21 <DanC>	        yeah... here's the scary bit: "   Leap seconds cannot be predicted far into the
                                future.  The
Aug 18 15:24:21 <DanC>	        International Earth Rotation Service publishes bulletins [IERS] that
Aug 18 15:24:21 <DanC>	        announce leap seconds with a few weeks' warning."
Aug 18 15:26:03 <Tantek>	DanC, which microformats? any/all that use datetime fields.
Aug 18 15:26:36 <DanC>	        hard to give useful advice, then.
Aug 18 15:26:58 <DanC>	        I expect they'll use datetime fields for different things that have different
                                cost/benefit trade-offs
Aug 18 15:27:26 <DanC>	        do you know of any particular differences that matter to anybody?
Aug 18 15:56:43 <KragenSitaker>	RFC3339 suggests -07:00, which seems like an improvement over -0700 anyway
Aug 18 15:56:49 <Tantek>	Kragen, agreed
Aug 18 15:57:01 <Tantek>	RFC3339 is certainly preferable to the ISO8601 subset in iCalendar
Aug 18 16:05:57 <DanC>	        Tantek's right, Kragen; iCalendar looks like it solves the local timezone
                                problem but doesn't.
Aug 18 16:06:14 <DanC>	        and it's true that there's no standard solution to the local timezone problem
Aug 18 16:06:39 <Tantek>	so instead of appearing to solve the problem but not solving it, we chose to
                                provide the ability to *approximate* the local timezone using e.g. "-07:00"
Aug 18 16:06:49 <DanC>	        the simplest thing is to have people use Z time in hCalendar. But I gather
                                that's unacceptably unusable?
Aug 18 16:07:35 <Tantek>	DanC, yes, the simplest thing is to have everyone use UTC Z
Aug 18 16:07:38 <Tantek>	However
Aug 18 16:07:50 <Tantek>	it is not *nearly* as usuable/verifiable
Aug 18 16:07:55 <Tantek>	as -07:00 etc.
Aug 18 16:08:02 <Tantek>	hence the decision to go with the latter
Aug 18 16:08:12 <Tantek>	some degree of human verifiability is important here
Aug 18 16:14:21 <Tantek>	DanC, my perception is that RFC3339 is a subset
Aug 18 16:17:00 <DanC>	        time-numoffset  = ("+" / "-") time-hour ":" time-minute
Aug 18 16:17:34 <DanC>	        ok, then I can't see any differences. (modulo recent leap seconds issues that
                                may affect xsd:dateTime )
Aug 18 16:18:07 <Tantek>	would be interesting to know why Atom 1.0 chose RFC3339 over xsd:dateTime
Aug 18 16:18:21 <Tantek>	if there was a "real" reason or if it was arbitrary / coin-flip.

Here's an exhaustive comparison from ndw. I think xsd:dateTime also allows unqualified local times, while RFC3339 allows only UTC with no known timezone (-00:00). In the end, Atompub followed the advice of Sam Ruby and Scott Hollenbeck, our area director. Atom dates make some additional restrictions on RFC3339, such as uppercase T and Z characters for compatibility with xsd:dateTime, RFC3339, W3C-DTF, and ISO8601. --Robert Sayre

Aug 18 16:18:43 <KragenSitaker>	rfc3339 is pretty short.
Aug 18 16:19:36 <Tantek>	DanC, BTW, which came first? REC for xsd:dateTime or RFC3339?
Aug 18 16:19:50 <DanC>	        RFC3339 is dated July 2002 ...
Aug 18 16:19:54 <KragenSitaker>	Right --- and you might be able to understand xsd:dateTime without
                                reading all of xml schema, you wouldn't be confident of it
Aug 18 16:20:25 <DanC>	        W3C Recommendation 28 October 2004 ... but that's 2nd ed...
Aug 18 16:20:47 <DanC>	        W3C Recommendation 02 May 2001
Aug 18 16:22:10 <DanC>	        I don't see a BNF in http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/#dateTime ...
Aug 18 16:22:43 <KragenSitaker>	yeah, appendix D of the current xml schema datatypes document seems
                                a little scanty, actually
Aug 18 16:23:28 <DanC>	        ah... 2nd ed of http://www.w3.org/TR/xmlschema-2/#date is much more
                                explicit about syntax.
Aug 18 16:23:30 <KragenSitaker>	it's 1100 words but still doesn't give any examples
Aug 18 16:23:35 <DanC>	        still, it's given in prose and not BNF
Aug 18 16:24:17 <KragenSitaker>	sections 3.2.9 through 3.2.14 seem to be the relevant ones around #date
Aug 18 16:24:29 <KragenSitaker>	which is another 2200 words
Aug 18 16:24:42 <DanC>	        wow... they changed the canonical form of date from always-Z to
                                timezone-allowed between 1st edition and 2nd edition
Aug 18 16:25:01 <Tantek>	Kragen, DanC, these are very good analyses
Aug 18 16:25:21 <Tantek>	could I ask you to summarize the pros/cons for each in a new section at
                                end of http://microformats.org/wiki/datetime-design-pattern
Aug 18 16:25:22 <Tantek>	?
Aug 18 16:25:58 <KragenSitaker>	rfc 3339 is 4000 words, excluding the last two pages of boilerplate.
Aug 18 16:26:31 <KragenSitaker>	so it's actually longer than the datetime-relevant parts of XSD but it
                                seems much more rigorous and clear
Aug 18 16:28:37 <DanC>	        my advice is: normatively cite both, and claim they specify the same
                                syntax, and let anybody who discovers otherwise send you a bug report
                                with a test case
Aug 18 16:29:12 <KragenSitaker>	danc: nice hack


The RFC3339 has a mandatory TIME portion of the DATE-TIME. Some vCard/iCalendar DATE-TIME stamps can omit the TIME. For instance, DTSTART, if that is a full day event, then you can omit the time. BDAY in vCard can be respresented by only a DATE. I like the idea of restricting the possible date formats, but i think that TIME should be optional, which it isn't in RFC3339. - brian suda

RFC 3339 allows lowercase 't' and 'z' while XSD doesn't. Specifying RFC 3339 plus 'T' and 'Z' MUST be caps will make them the same. - Joe Gregorio

---

A few questions: asked by CharlesBelov 16:57, 24 Apr 2007 (PDT), answered by JamesCraig on 15:58, 5 Jul 2007 (PDT).

  1. Would it make more sense for documenting the alternative codings pitting the abbr tag vs. other tags to be on this page? Answer: That documentation should go on the assistive-technology-abbr-results page.
  2. Would using the title attribute of the abbr tag to encode the machine-readable date in fact cause a failure of WCAG 2.0 Accessibility? What about USA Section 508? It does appear to violate Technique for WCAG 2.0 H28: Providing definitions for abbreviations by using the abbr and acronym elements, although that is a supporting document and does not have the force of a guideline. Answer: Yes, it appears that is in violation of WCAG, 508, et al, so alternatives are being discussed on the assistive-technology-abbr-results page.
  3. In order to maintain accessibility, would it make sense to enclose the machine-readable date in a span with a style of "display:none" instead of using the abbr tag? Answer: please refer to and add any suggestions to assistive-technology-abbr-results.
  4. For that matter, wouldn't you want to style such an abbr tag with text-decoration:none to hide that an abbr tag was used? Otherwise, visitors might cursor over the time, see the machine time, and be annoyed that their time was wasted or else be confused. And I don't think you can suppress the title from coming up if the human-readable time was inadvertently hovered. Answer: Microformats should not rely on CSS in order to work properly, but again, that discussion can be found here: assistive-technology-abbr-results.

Code

The following regular expression (parsed VERBOSE) should break apart a datetime and cover many lightly broken cases seen in the wild. This has been tested under Python.

 ^
 (?P<year>\d\d\d\d)
 ([-])?(?P<month>\d\d)
 ([-])?(?P<day>\d\d)
 (
  (T|\s+)
  (?P<hour>\d\d)
  (
   ([:])?(?P<minute>\d\d)
   (
    ([:])?(?P<second>\d\d)
    (
     ([.])?(?P<fraction>\d+)
    )?
   )?
  )?
 )?
 (
  (?P<tzzulu>Z)
  |
  (?P<tzoffset>[-+])
  (?P<tzhour>\d\d)
  ([:])?(?P<tzminute>\d\d)
 )?
 $

Other Proposals

strtime instructions as class names

Proposal by DavidLaban (alsuren on freenode) on 8 Jun 2008 It might be possible to have a slightly more readable/extensible/elegant format:

<span class="strtime format:_%d_%B_%Y_" > 16 March 1987 </span>

Notes:

  1. Underscores are used to replace whitespace, because otherwise the the formatting string will be split into an unordered set of class attributes by many parsers (thanks go to bogdanlazarsb and gsnedders on irc for explaining this to me).
  2. Some subset of the placeholders should be chosen from those which are supported by both python http://docs.python.org/lib/module-time.html and php http://uk3.php.net/manual/en/function.strftime.php
  3. A name for the class should be decided upon. strtime might not be the best name.
  4. Measures should be taken to avoid the format string accidentally conflicting with other valid classes (In the above example, I have prefixed it with the string "format:")
  5. It might be sensible (when parsing) to strip excess whitespace from the format string and contents. This is not done in this example.
  6. Example python code follows.
date = (1987,03,16,0,0,0,0,0,0)
format = " %d %B %Y "
# To encode:
classes = ["strtime"]
encoded_format = "format:" + format.replace(' ', '_')
classes.append(encoded_format)
content = time.strftime(format, birthday)
# ... dump classes and content into your document however you want

# To decode (assuming that you have managed to extract class and format from the document already):
if "strtime" in classes:
    possible_formats = [ item for item in classes if item.startswith('format:') ]
    assert len(possible_formats) == 1
    format = possible_formats[0].strip('format:').replace('_', ' ')
    date = time.strptime(content, format)

problems with strtime proposal

  1. Possible abuse of the class attribute. microformats limit the use of the class attribute to marking up additional semantics about the data, not for (potentially) arbitrary processing/programming instructions
    • HTML 4.01 Recommendation defines the class attribute as being "for general purpose processing by user agents". TobyInk 13:21, 8 Jun 2008 (PDT)
  2. Requires authors to think like programmers. The larger problem is that the proposal asks web authors to think like programmers, which severely limits the number of web authors which will be able to use the technique, since the vast majority of web authors are not programmers and have never heard of "strtime", whereas most authors (even people) on the web have seen dates like 2005-06-20 and easily understand what they mean.

In general, any publishing method that requires the author to think like a programmer is a non-starter. It is a much more of a barrier than simply using ISO8601/RFC3339, and that barrier is a far worse tradeoff than the duplication / DRY violation compromise. Tantek 09:52, 8 Jun 2008 (PDT)

  • Another problem: if %A/%a/%B/%b are allowed, this raises potential problems with internationalisation. Will parsers be required to understand the names and abbreviations for days and months in potentially hundreds of different languages? TobyInk 14:09, 8 Jun 2008 (PDT)

Machine-data in class

The BBC (uf-dev archive, 20/06/08, "Using class for non-human data") has proposed as an alternative to the empty span and title solution to use the class name in the following way:

<span class="dtstart data-20051010T10:10:10-0100">10 o'clock on the 10th</span>

Pros:

  • Allows data to be represented in a "non-harmful" way. Will not be read aloud by screenreaders or seen as tooltips.
  • Minimises mark-up used.
  • Arguably more semantic than use of "title" attribute for non-human data.

Cons:

  • data in the class attribute has already been discussed numerous times in the mailing list over the years and rejected and documented as an anti-pattern - captured on the wiki this past January 2008.
  • Possible misuse of class attribute, although as noted previously, the HTML spec states "for general purpose processing by user agents".
  • The class attribute has been adopted by the broader web design community to "subclass" element semantics, and to layer additional semantics. To date, microformats has followed this existing practice developed by modern web designers ("paving the cow-paths"). This use of class for data is outside all current practices.

Discussion:

  • This proposal smells icky, but I can't quite put my finger on why. Considered objectively, it does seem to be the least harmful solution proposed so far. TobyInk 06:06, 21 Jun 2008 (PDT)
  • I really like it, especially given the HTML4 spec gives this as an IMO perfectly valid use (on both id and class, with the following examples given in the id section: "identifying fields when extracting data from HTML pages into a database, translating HTML documents into other formats, etc."). Clean and simple. Dracos 03:53, 23 Jun 2008 (PDT)
  • I suggest dropping the redundant 'data-' prefix, unless someone can suggest a feasible case with two time-stamps requiring different prefixes. The proposal then becomes one I've made before AshSearle
    • Valid class names cannot begin with a number, so a date needs some sort of letter prefix. It's sensible to make this prefix meaningful and reusable in some way. Phae
    • Not to advocate too strongly for designing for parsers (generally a bad idea), *but* having a 'data-' prefix on a class name would make identifying data orders of magnitude easier for parsing. Otherwise, how do we know what's data and what's just another class name for some other purpose? Drew
    • A 'data-' prefix would help authors tasked with maintaining or reviewing a page to understand the purpose of a class name that may have been applied by another author. The data prefix communicates very simply that the class name is precisely that, data. Therefore the value is less likely to be accidentally removed or changed, making for a more robust design. Drew
      • I concede - regardless of whether it's valid (X)HTML / CSS - a prefix is needed to distinguish data values from genuine content. ISO 8601 allows date-time to be as simple as "2004", which could easily be misinterpreted. e.g. if a CMS outputs product model numbers into the class attribute for some other purpose. --AshSearle 01:42, 25 Jun 2008 (PDT)
      • Using data- as the prefix here is undesirable, as it now conflicts with HTML5's proposed -data prefix on attributes. It's undesirable to set ourselves up for future confusion with our own conflicting specification of ‘a data prefix’. A different prefix should be considered. See: HTML5 Editors Draft: Embedding Custom Non-Visible DataBenWard 07:34, 16 Jul 2008 (PDT)
  • -1 Tantek. I'm vehemently opposed to putting data in the class attribute. We must find better alternatives. We must not go down the path of invisible (dark) (meta)data - IMHO that principle is inviolable for microformats.
    • JakeArchibald If you're so opposed, it'd be useful to see some justification.
      • See above cons. Already discussed/rejected many times in the history of microformats discussions. In short, quit wasting time on old ground. Abuses class attribute, or if you prefer, introduces a *new* use of the class attribute, unlike microformats to-date which have simply made use of a well established semantic of the class attribute. And worst of all, completely violates the visible data principle. Rejectable on that alone.
        • The fact that something has been discussed and rejected before is not sufficient grounds for dismissing it out of hand once additional research and thinking has taken place. The <abbr> datetime pattern arguably abused the title attribute and introduced a new use of the title attribute. In the example given (<span class="dtstart data-20051010T10:10:10-0100">10 o'clock on the 10th</span>), the data is presented as "10 o'clock on the 10th" - very visible. Yes, of course, it's possible to abuse this proposed pattern, and hide the data, but the same can be said of the abbr pattern or even the class pattern given CSS's display:none property. Experience shows that in most cases, people won't abuse the pattern to hide data, as they actually want to show the data on their pages. TobyInk 13:52, 24 Jun 2008 (PDT)
        • It's not an abuse of the class attribute according to the HTML spec. as I read it. Whereas using title for non-human data is (the spec says audio agents may read it aloud, for example). If it's a new use, it's one the writers of HTML4 considered, so it wasn't new to them. Dracos 03:38, 25 Jun 2008 (PDT)
    • Relevant princeples are 'humans first, machines second', and 'visible data is better'. These are preferences, not inviolable principles. Besides, the existing datetime spec doesn't adhere to these principles anyway: prioritising humans first, let's discuss this at the next meeting should be marked up as let's discuss this at the <abbr title="28th June 2008 at 3:15pm">next meeting. There's a conflict between human- and machine-readable dates. It makes more sense to 'hide' the machine-readable data in the only attribute available to us: class. --AshSearle 03:02, 25 Jun 2008 (PDT)
      • Indeed - by their nature, principles are generalisations, and while broadly useful, usually have exceptions. After experience, the microformats community is beginning to see some of these. Human vs machine dates is the one gathering the most interest at the moment, largely thanks to WaSP and auntie, but I think other properties which will probably emerge as exceptions to the general microformats principles are hReview's type property, and the its namesake subproperty for hCard's tel, email and adr (and label?) properties. TobyInk 12:51, 25 Jun 2008 (PDT)

Experimental Parser Support

Cognition 0.1 alpha 10 will include experimental support for this pattern, and the Cognition web service already does. Notes:

  • Support is opt-in. Publishers must explicitly request support for the pattern, by including a profile URI of http://purl.org/uF/pattern-data-class/1 in their document head.
  • Support is not limited to date-time properties, but any microformat properties.
  • data-X classes must use percent-encoding to encode spaces and other characters not allowed in class names.
  • The data-X class must be found on the same element as the microformat property class. That is, you cannot use:
    <span class="dtstart"><span class="data-20051010T10:10:10-0100">10 o'clock</span></span>
  • Multiple data-X classes may occur on the same element. When these are found, the longest string is used. This allows for:
    <span class="dtstart data-2005 data-200510 data-20051010">The 10th</span>
    which may be useful for styling or other non-microformat purposes.
  • Can be combined with value excerpting. e.g.
    <p class="dtstart">
      The concert will be held on
      <span class="value data-20080804">the 4th of August</span>
      starting at
      <span class="value data-T193000">7:30pm precisely</span>.
    </p>
    

date and time separation using value excerption

summary

By specifying a more precise parsing of the use of "value" excerption inside all datetime properties (e.g. dtstart, dtend, published, updated etc.), dates and times can be marked up separately, thus reducing/minimizing (and potentially eliminating) the readability issues that come with compound ISO8601 datetimes.

introductory example

The sentence:

 The weekly dinner is tonight at 6:30pm.

would be marked up as:

 The weekly dinner is <span class="dtstart"><abbr class="value" title="2008-06-24">tonight</abbr> 
 at <abbr class="value" title="18:30">6:30pm</abbr></span>.

advantages

  • re-uses the readable abbr-date-pattern
  • identifies a similarly readable abbr-time-pattern.
  • minimizes DRY violation distance, keeps machine data on exactly the same element as the respective human data
    • even better than abbr-datetime-pattern does, which, in practice from experience often required specifying the date in machine readable form on the human readable time (separate from the human readable date).
  • introduces no new class names - principle of minimal invention
  • introduces no new use of the class attribute - principle of minimal invention again
  • introduces no new syntax (see above about any publishing method that requires the author to think like a programmer being a non-starter, and introducing new syntax almost always requires authors to think like programmers).
  • and most importantly, introduces no dark data.

issues

Some potential issues were raised in IRC, and it helps to document/resolve them so that they are not brought up repeatedly.

  • [Does this sufficiently address the concerns raised with the current use of abbr-pattern?]
    1. The abbr-date-pattern, as documented and explained by Jeremy Keith is just fine (in contrast to the abbr-datetime-pattern).
    2. Similar to the abbr-date-pattern, this proposal implies/introduces the abbr-time-pattern, which is similarly acceptable.
    3. In addition, as long there is incremental improvement, we are making progress. It is more important to take small steps that we know will help some things, rather than try to take a big step that is more risky in the attempt to help more but may not actually do so (as most big changes don't), therefore "sufficiently" is a flawed way of evaluating incremental fixes.
  • Exposes data through tooltips. Separating into 2008-06-07 and 18:03 improves the ability for humans to consume the data, but still exposes data through tooltips and speech in formats that the publisher did not choose to use. --BenWard 04:52, 25 Jun 2008 (PDT)
    1. This is a feature, not a bug. By making the duplicated data at least *somewhat* visible (rather than fully invisible), effective data quality is increased due to the fact that the probability of the ISO8601 and locale-specific data getting out-of-sync is reduced because of the increased visibility (and therefore the increased inspectability and more eye-balls looking at/for problems effect).
    2. Workaround: if a site publisher wishes to customize the presentation of tooltips, they can do so with a nested span with title.
      • That proposes extraneous mark-up maintain some publisher's wish not to have a tool-tip in the first place. I object to a microformat pattern requiring an immediate work-around to meet publisher's desires. It goes against ‘Humans first…’. --BenWard 09:09, 30 Jun 2008 (PDT)
        1. Additional markup has nothing to do with "Humans first".
        2. Additional markup to work-around minor issues (e.g. CSS, cross-browser compatibility, etc.) is a well accepted modern web design practice. It's not ideal, but it is both accepted and widely practiced. With the use of <span> and <div> elements, it's also semantically neutral, therefore not a problem from that perspective either.
        3. Finally, it should not be our goal to try to satisfy *every* publisher, for that would make every microformat beholden to every publisher and contort the design of microformats in really poor ways. We must accept that not all publishers will adopt all microformats and that is ok. Our goal to incrementally increase the number of publishers that adopt microformats, not to try to satisfy each and every one.
  • Semantic misuses of ABBR. That ‘tonight’ is ever a textual, human abbreviation of ‘2008-06-24’ is not accepted.
    1. Semantic stretch not misuse. It is a semantic abbreviation rather than a purely syntactical (character shortening) abbreviation, but it is an abbreviation in context nonetheless. Though this may stretch what may be commonly expected as an "abbreviation", the HTML4 spec does seem to allow some flexibility here (HTML 4.01 9.2.1 Phrase elements).
  • Maintaining proper sentences with the expanded form. It is not always possible to use this mark-up and maintain proper sentences with the expanded form. e.g. it's my <abbr class="bday" title="2005-06-20">birthday today</abbr>! becomes ‘it's my 2005-06-20!’. And thus audio rendition of such titles can be nonsensical - "The weekly dinner is two thousand and eight dash zero six dash twenty four at eighteen thirty."
    1. This can and should be addressed by improving authoring examples so that practices improve with experience.
  • Publishing practices and desires show us that authors are not willing to compromise the semantics of abbr. Phae 04:30, 27 Jun 2008 (PDT)
    1. Without specific citations of which authors and what specific issues they have, we are unable to address their issues.
    2. See also above - not our goal to satisfy *every* publisher, but rather to incrementally satisfy more and more. We must accept that there may be some authors we are unable to satisfy in the immediate/short-term.
  • [That's getting pretty complicated]
    • Much less complicated than inventing yet another syntax ( " { ... } " ???? ) that web authors would have to learn.
      • But it's all in one place, rather than spreading it out.
        • The spreading it out is what current content publishing practices do already! It is much more important to map the machine data as close to the existing publishing practice as possible, than to try to "put it all in one place". The "put it all in one place" way of thinking is why people ended up sticking so much invisible metadata in the head of the document, which we know fails.

content requirements

Some requirements which enhance both human readability, and machine parsability (best of both) :

  • date value excerpts MUST use hyphen separators. E.g. 2008-06-24. Not ok:20080624.
  • time value excerpts MUST use colon separators (seconds optional, implied :00 if absent). E.g. 18:30 or 18:30:00. Not ok:183000.
  • timezone value excerpts MUST use leading plus or minus and NO colon separator. E.g. -0700. Not ok:-07:00.

derivation

It's important to document the derivation/background of a brainstorm/proposal as it allows others to see some of the thinking that went into it, and avoid having to rediscuss alternatives already considered, and helps provide understanding as to why aspects of the design are as they are.

example with datetime

Here is a short code example:

 the weekly dinner is tonight at <span class="dtstart">2008-06-24T18:30</span>
example with abbr datetime

However that's not the easiest to read, nor do most people publish that as human visible text, so per the abbr-datetime pattern:

 the weekly dinner is tonight at <abbr class="dtstart" title="2008-06-24T18:30">6:30pm</abbr>

which has raised two issues:

  1. When "2008-06-24T18:30" is inspected by a human reading a tooltip, or spoken by a screen reader, it's not the most understandable thing (precise citation needed, perhaps an mp3 with screen reader used version info).
  2. There is a non-local violation of DRY (which IMHO is a worse problem, as it leads to worse data quality -Tantek). That is, the "date" information is now not only in the text twice (as it was before), but those two instances of the date information are not on the same element, which makes it worse. That is, "tonight" is in the prose, outside of the element with the precise date "2008-06-24".

In analysis of examples of event information on the web, the date and time are often published in separate elements, often for display purposes.

Thus it is this existing content publishing practice which leads to this brainstorm proposal, to essentially to introduce a date and time value excerption longhand.

(Initially Tantek's idea that he bounced off Jeremy Keith (similar idea conceived by Drew independently) was to introduce new classes "datevalue", "timevalue" and "tzvalue" for this purpose, but Bob Jonkman pointed out that HTML5's time parsing algorithm enables a single <time> element to contain dates or times (with or without timezone) without having to explicitly say whether the value contains dates or times (with or without timezone). Bob then proposed that thus all was needed was a single new "datetime" class name. This was the key realization that allowed minimal invention. Tantek pointed out that since from the type of property we already know it is a datetime, there was no need for even one new class name, that we could simply re-use "value" excerption, and simply more precisely specify the semantics/parsing in the case of datetime properties.)

example with new date and time value excerpts

Thus we markup the date and time separately, as value excerpts, using the abbr-date-pattern and an implied parallel abbr-time-pattern:

 The weekly dinner is <span class="dtstart"><abbr class="value" title="2008-06-24">tonight</abbr> 
 at <abbr class="value" title="18:30">6:30pm</abbr></span>.
separate subtrees

The proposal also allows setting the date and time in separate element subtrees as well, which may be necessary for some document structures:

 the weekly dinner is <span class="dtstart"><abbr class="value" title="2008-06-24">tonight</abbr></span> 
 at <span class="dtstart"><abbr class="value" title="18:30">6:30pm</abbr></span>.

Note the two instances of dtstart, one of which sets the date for the dtstart, and the other of which sets the time.

The idea being, when a parser sees a datetime property (e.g. dtstart) with a value excerpt, that it only "set" the component of its full value that is specified by the value excerpt (e.g. the date), and that if lacking a complete datetime, it continue to parse additional instances of that datetime property for the remaining component(s) (e.g. the time).

Of course this only works for singular properties, but fortunately all instances of datetime properties so far are singular, so this works.

  • hCard's rev is plural. TobyInk
    • can someone give a reference to this being the case? The RFC says "The value distinguishes the current revision of the information in this vCard for other renditions of the information." Does it make sense to have multiple REV dates in a single vCard?
      • The RFC is ambiguous as usual, but a contact card could conceivably have had several changes made to it, with a rev for each. ("Change logs" are fairly common on the web.) The hCard spec is fairly specific about which properties are singular and which are not, and rev is not included in the list of singular properties.
reusing date data for multiple datetime properties

This also provides a *very* convenient way to re-use the same date information for start and end, e.g. expanding the example:

 the weekly dinner is <span class="dtstart dtend"><abbr class="value" title="2008-06-24">tonight</abbr></span> 
 from <span class="dtstart"><abbr class="value" title="18:30">6:30</abbr></span> - 
 <span class="dtend"><abbr class="value" title="20:30">8:30pm</abbr></span>.

Note what just happened. we just eliminated another duplication of date information by reusing the start *date* information for the end *date* information and *only* specifying the end *time* information separately for the two properties.

Reducing the duplication (or triplication) of such data helps to reduce the chances of (even inadvertent) data corruption/drift/divergence among any duplicates.

time zones

There are a few choices for timezones.

  1. Simply include the time zone information as part of the time "value".
    E.g. <abbr class="value" title="18:30-0700">6:30pm</abbr>
  2. Or use another value excerpt for the timezone (was: introduce the class name "tzvalue")
    E.g. <abbr class="value" title="18:30">6:30pm</abbr> <abbr class="value" title="-0700">PDT</abbr>
  3. Or allow both and let web authors decide. This is the current leaning.
    • if web authors want to specify timezone as part of the time (first example above), they can,
    • or if web authors visibly publish the timezone separately (second example above), then they can mark that up.
    • or if web authors wish to omit timezone information, they can do so as well, as most do today. In practice this works fine, as it creates a "floating" time which works fine in far more than the 80/20.


(more to come, documenting from IRC logs)

discussion

Opening up a discussion section even though documentation from IRC logs is still in progress. :)

  • regarding the advantage of "and most importantly, introduces no dark data."
    • "Dark data" is sometimes what publishers *want* to publish. To use the example of TV schedules which kick started the renewed discussion in this area, publishers will often not want to display the date. For instance, if a page entitled "Tomorrow's TV" and containing 300 different programmes marked up with dtstart, it is superfluous to explicitly display the date for each one. With this proposed solution the include pattern could be used to include the date into each vevent, but a visible link to the date on each programme would simply be confusing. Sometimes it just makes sense to hide some of the information you're publishing as a microformat - because the information you want to make explicit to parsers can be inferred from context by humans, or is more appropriately displayed at a different level of granularity for machines and humans. TobyInk 14:26, 24 Jun 2008 (PDT)
      • It doesn't matter whether publishers *want* to publish dark data or not. Invisible data always leads to poorer quality data. Publishers publish all kinds of invisible metadata in the heads of documents etc. because they want to, but their desire doesn't stop the data from becoming obsolete, diverging from the actual visible data etc. The quality of the data matters more than any publishers wish(es) of publishing in a specific format, or in a hidden way. In the example you gave, using the include pattern in that way would not result in any visible links, but merely empty include anchors. It never makes any sense to actually hide "some of the information you're publishing as a microformat", because historically that always results in some loss of data quality over time and thus the microformats principle of visible data instead of invisible metadata. Tantek 14:32, 24 Jun 2008 (PDT)
        • All microformats hide some data. In the example <span class="tel">01632 960123</span>, the information that the long string of numbers represents a telephone number is invisible. And making it visible (Tel: <span class="tel">01632 960123</span>) violates DRY. It's just a matter of where to draw the line.
          • That statement makes the mistake of conflating *type* data and *content* data. "tel" is not content data, just as <p> is not content data. It's markup, indicating the type of the data. Markup (type data) being invisible to the user has worked just fine. Content (content data) being invisible to the user is the problem of dark data. Or rather, if you think that everything is data, then you really should be spending time developing in a system that is built on that assumption, e.g. RDF, rather than microformats, which are built on HTML, and the clear separation of type of data (HTML elements, microformats properties) and content data (inner text, text attribute values).
            • My point is that there isn't a distinction between the two, but a continuum. The choice of where to draw the line is never a clear one and always somewhat arbitrary. The vCard standard could quite easily have ended up with separate "TEL", "FAX" and "CELL" properties, in which case hCard would have ended up with <foo class="tel">, <bar class="fax"> and <baz class="cell">. Going the other way, they could have stored e-mail addresses as mailto: URLs, and then hCard would have <a class="url" href="mailto:quux@example.com">. They chose the way they did, and as a result in hCard the distinction between a mailto: URI and an http: URI is largely invisible (in most circumstances only obvious by looking at the status bar when hovering), but the distinction between a telephone number and a fax number is visible. But that wasn't the only possible (nor the only reasonable) outcome.

HTML 5 <time> Element

See hCalendar issues

Plain Old English alternative to ISO date

Example (in English):

<abbr title="January 25th, 2008" class="dstart" lang="en-us">1/25</abbr>
<span class="dstart">January 25th, 2008</span>

If lang="en-us", the format of the date used in the title attribute must conform to dates writing rules in American English.

Example (in French):

<abbr title="25 Janvier 2008" class="dstart" lang="fr">25/1</abbr>

If lang="fr", the format of the date used in the title attribute must conform to date writing rules in French.

Benefits

  • Human-hearable: should work nicely with screen readers (to be tested).
  • Human-readable
  • Compliance with semantics of abbr.
  • Very easy to use by HTML human authors.
  • DRY compliant if HTML human authors are willing to write in correct English

Discussion

  • Locale-specific parsing logic.
  • Not all cultures use the same calendar — Dan Brickley
  • There are situations where markup clues used for localisation might be misleading, such as people using microformats in a post on a site they do not themselves run that may even be in a different country. (a shared blogging site that allows html tags in posts would be a good example here) — Michael MD
    • Couldn't the person or tool adding the microformat annotation also add a lang attribute at the same time? — Benjamin Hawkes-Lewis
  • Cognition already supports this as a last ditch attempt at parsing dates - but I wouldn't recommend it get adopted widely. It's too unreliable; too much work to deal with internationalisation; too much work full-stop in languages that don't provide a handy library that takes care of most of the work. — User:TobyInk
    • I don't think we need to support all locales at once. I don't know in how many written languages BBC publishes in, but it might be that supporting en-uk and en-us might be enough for a start. Also, one can imagine that Microformats tools could focus on the most common written languages and then expose hooks for others to implement support for other locales. — Guillaume Lebleu
    • What are our priorities? Making programmers' life easier or making content authors and content readers' life easier? — Guillaume Lebleu
      • In Australia dd/mm/yyyy is commonly used, but a significant minority of sites in Australia use US-style mm/dd/yyyy because it is the default setting in their CMS. How would a parser be able to tell which part is the day and which is the month? Getting it wrong would be worse than not getting it at all! Until the use of ambiguous formats can be wiped out we will need a version for machines! — Michael MD
      • problem which hasn't been raised with regards to this proposal is that even though you are proposing a fixed date format, because it *looks* like natural language, authors will assume that it *is* natural language, and simply start including dates in whatever format they like. Then you get an NLP "arms race" between publishers and parsers. If you don't believe that that will happen, take a look at what happened with RFC 822 dates, which are simply a mess. — User:TobyInk
        • Very true the more you make the date look like natural language, the less it looks like a fixed format. I really don't want us to get involved in any form of NLP, it just would not work. I think it was Mike who said that dates have to be parsed correctly, no level of error is acceptable. I don't want to travel to an event on the wrong day because a parser got the date wrong — Glenn Jones

Notes

  • Could work alongside with the existing datetime-design-pattern: if ISO date parsing fails, try parsing the title as a human-readable plain old English/French/... date according to the locale.

<object> element to represent dates

  • Re-raised by George Brocklehurst on June 25th 2008.

The idea was to do something like this:

<code><object data="20050125">January 25</object></code>
  • From what Tantek said on his blog , the main reason for not using objects was that they were not well supported in Safari. However, Safari's object support is now much improved: fallbacks are supported and display:inline and intrinsic sizing will work correctly. Safari 2.0.2, which came out in November 2005, was the first version to contain these improvements. — George Brocklehurst
  • The following appears to be well behaved inline in Safari 2.04 and 3.1.1, Firefox 1, 1.5, 2 and 3, and Opera 7, 8 and 9: <object class="dtstart" data="data://20080712"></object>. Test case: http://pastie.org/224023 — Ben Ward
    • IE 6, 7 and the beta version of IE 8 all visibly render the object element as a small box, similar to the way they would render a missing image: object-in-ie.png — George Brocklehurst
    • Absolute URIs can't start with a number, but relative ones can - and the data attribute is permitted to contain relative URIs. [Don't need to use data:// URI) — Toby A Inkster
  • Could also look at <object class="dtstart"><param name="value" value="20050125" />January 25</object> — Scott Reynen
  • 1. The purpose of the <object> element is to allow the browser to run an external application for a non-native data type (e.g., Java applet). See: http://www.w3.org/TR/REC-html40/struct/objects.html#h-13.3. Object is not the right way to go in this case. — Sarven Capadisli

See Also