value-class-pattern: Difference between revisions

From Microformats Wiki
Jump to navigation Jump to search
(allow HH of 24 per ISO8601)
(→‎Basic Parsing: numbered parsing steps, and added type-specific language to better connect basic parsing with how to parse multiple value elements for a datetime property)
Line 59: Line 59:


==Basic Parsing==
==Basic Parsing==
* The value class pattern only applies to properties which are simple strings, tel, and datetimes. The value class pattern does not affect parsing of properties of type email, URL, URI, UID.
# The value class pattern only applies to properties which are simple strings, tel, and datetimes. The value class pattern does not affect parsing of properties of type email, URL, URI, UID.
* Where an element with such a microformat property class name has a descendant with class name <code>value</code> (a "value" element), parsers should use the following portion of that element:   
# Where an element with such a microformat property class name has a descendant with class name <code>value</code> (a "value" element), parsers should use the following portion of that element:   
** if the value element is an <code>img</code> or <code>area</code> element, then use the element's <code>alt</code> attribute value.
## if the value element is an <code>img</code> or <code>area</code> element, then use the element's <code>alt</code> attribute value.
** if the value element is an <code>abbr</code> element, then use the element's <code>title</code> attribute value.
## if the value element is an <code>abbr</code> element, then use the element's <code>title</code> attribute value.
** for any other element, use its inner-text.
## for any other element, use its inner-text.
* Where there are multiple descendants of a property with class name of <code>value</code>, the values extracted from them should be concatenated ''without'' inserting additional characters or white-space.
# Where there are multiple descendants of a property with class name of <code>value</code> (multiple value elements)
* Descendants with class of <code>value</code> must not be parsed deeper than one level. That is, where an element <code>foo</code> with class <code>value</code> has a <em>descendant</em> <code>bar</code> with class <code>value</code>, the content of <strong><code>foo</code></strong> is taken as the <code>value</code>. Nesting additional elements with class of <code>value</code> cannot be used to further isolate a property's value.
## if the microformats property expects a simple string or tel value, then the values extracted from the value elements should be concatenated ''without'' inserting additional characters or white-space.
## if the microformats property expects a datetime value, see the Date Time Parsing section.
# Descendants with class of <code>value</code> must not be parsed deeper than one level. That is, where an element <code>foo</code> with class <code>value</code> has a <em>descendant</em> <code>bar</code> with class <code>value</code>, the content of <strong><code>foo</code></strong> is taken as the <code>value</code>. Nesting additional elements with class of <code>value</code> cannot be used to further isolate a property's value.


e.g.
e.g.

Revision as of 23:54, 1 May 2009

<entry-title>Value Class Pattern</entry-title> The value class pattern is derived from value-excerpting in hCard. As such, it is already somewhat supported in parsers. However, the precise parsing behavior is not quite finalized, and the documentation is a work in progress. The pattern should be used with some caution.

Editor
Ben Ward

Sometimes, only a part of an element's content is to be used as the value of a microformat property. This may occur when a property has optional subproperties, such as tel: type and tel: value in hCard. Other times, the most appropriate structure for a property may include other content.

For these purposes, the special class name value is used to mark-up the relevant data excerpt from larger element content.

Simple Examples

Here is markup for a home phone number:

vCard fragment:

TEL;TYPE=HOME:+1.415.555.1212

hCard fragment:

 <span class="tel">
   <span class="type">Home</span>:
   <span class="value">+1.415.555.1212</span>
 </span>

In this case, the value of tel is +1.415.555.1212, not Home: +1.415.555.1212.

Another example, this time using a localized (British) telephone number:

 <span class="tel">
   <span class="type">Home</span>:
   <span class="value">+44</span> (0) <span class="value">1223 123 123</span>
 </span>

In this case, the valid data for the telephone number is +441223123123, but the way in which phone number is presented in Britain will include the (0), for local dialling. That is, from anywhere in the world you may dial +441223123123, or from within Britain you may dial 01223123123. Common local publishing interferes with the data, since dialling +4401223123123 is an invalid number.

In the mark-up, two value classes target the part of the telephone number string that makes an international, valid number, whilst allowing conventional presentation.

Another example, using dtstart in hCalendar:

 <span class="dtstart">
    Friday 25th May, 6pm
    [<span class="value">2008-05-25T18:00:00+0100</span>]
 </span>

Whilst the entire string ‘Friday 25th May, 6pm […]’ is date information, it's only the ISO 8601 encoded datetime which must be consumed by a microformats parser, so the value class isolates it.

Note that "dtstart" is a datetime property and thus subject to additional special value class pattern handling as described in the Date and time concatenation section below.

Basic Parsing

  1. The value class pattern only applies to properties which are simple strings, tel, and datetimes. The value class pattern does not affect parsing of properties of type email, URL, URI, UID.
  2. Where an element with such a microformat property class name has a descendant with class name value (a "value" element), parsers should use the following portion of that element:
    1. if the value element is an img or area element, then use the element's alt attribute value.
    2. if the value element is an abbr element, then use the element's title attribute value.
    3. for any other element, use its inner-text.
  3. Where there are multiple descendants of a property with class name of value (multiple value elements)
    1. if the microformats property expects a simple string or tel value, then the values extracted from the value elements should be concatenated without inserting additional characters or white-space.
    2. if the microformats property expects a datetime value, see the Date Time Parsing section.
  4. Descendants with class of value must not be parsed deeper than one level. That is, where an element foo with class value has a descendant bar with class value, the content of foo is taken as the value. Nesting additional elements with class of value cannot be used to further isolate a property's value.

e.g.

 <p class="description">
  <foo class="value">
    <bar class="value">Puppies Rule!</bar>
    <strong>But kittens are better!</strong>
 </foo>
</p>

In this example, description has a child ‘value’, and that child has a grandchildvalue’. However, the parsing of value classes stops at the first level, so the data for description is: <bar class="value">Puppies Rule!</bar><strong>But kittens are better!</strong>.

Date and time concatenation

The value class pattern can be used to separately markup the date and time portions of a datetime property which are then combined to specify a datetime value.

Example:

<p>The weekly dinner will be on 
    <span class="dtstart">
        <abbr class="value" title="2008-06-24">this Tuesday</abbr> 
     at <span class="value">18:30</span>
    </span>
</p>

Produces:

DTSTART:2008-06-24T18:30:00

The lack of a timezone indicates a "floating" datetime, that is a datetime independent of any particular timezone. Examples of floating datetimes:

  • An alarm clock you set to ring at 7am.
  • The 9am-5pm workday.

Parsing date and time concatenation

For all date time properties (as defined in their respective microformats specifications), the following rules apply in addition to (and in some cases replacing) the above value class pattern parsing rules.

When a "value" element is found, parse a value from the element as follows:

  • if the element is an img or area element, then use the element's alt attribute value.
  • if the element is an abbr element, then use the element's title attribute value.
  • for any other element, use its inner-text.
  • if the value has both a specific ISO8601 date and a specific time, use those and stop looking for "value" elements.
  • if the value has *only* a specific date, specifically, fits the following ISO8601 date patterns (i.e. as documented in the Wikipedia summary of ISO8601)
    • YYYY-MM-DD
    • YYYY-DDD
    • then use that as the date value. For the purposes of the value class pattern, the hyphens "-" separating the year, month, day and/or ordinal day are required.
    • ignore any further "value" elements that specify the date.
  • if the value has *only* a specific time (with or without timezone), parse it for a time value as follows
    • HH:MM:SS-XX:YY
    • HH:MM:SS+XX:YY
    • HH:MM:SS-XXYY
    • HH:MM:SS+XXYY
    • HH:MM:SSZ
    • HH:MM:SS
    • HH:MM-XX:YY
    • HH:MM+XX:YY
    • HH:MM-XXYY
    • HH:MM+XXYY
    • HH:MMZ
    • HH:MM
    • HH is the 24 hour "hours" in the time, from 00 to 24, with optional leading 0 for values less than 10.
    • MM are the minutes from 00 to 59
    • SS are the optional seconds from 00 to 59 (60 for a leap second). If omitted, infer 00.
    • XX is the time zone hours offset, from 00 to 12
    • YY is the time zone minutes offset, from 00 to 59, though in practice only 00, 15, 30, 45 minute offsets are used in global timezones.
    • Z is the literal 'Z' to indicate GMT.
    • For the purposes of the value class pattern, the colons ":" separating the hour, minutes, seconds are required.
    • (NOTE: consider a case insensitive { }"am"|{ }"a.m." suffix to treat an HH value of 12 as 00, or a case-insensitive { }"pm"|{ }"p.m." suffix to add 12 to HH value less than 12 - per Wikipedia article on the 12 hour clock)
    • ignore any further "value" elements that specify the time.
  • if the value has *only* a specific timezone, parse it as follows
    • -XX:YY
    • +XX:YY
    • -XXYY
    • +XXYY
    • Z
    • ignore any further "value" elements that specify the timezone.

If by parsing the "value" element(s), at least a specific date has been found, then the "value" is overall valid, and the parser assembles the overall datetime value by concatenating the specific date, "T" and specific time (if time was specified, with 00 seconds implied if no seconds are provided), and specific timezone (if timezone and a specific time was specified).

  • YYYY-MM-DD - no time specified
  • YYYY-MM-DDTHH:MM:SS - time specified but no timezone. This is a floating time.
  • YYYY-MM-DDTHH:MM:SS-XX:YY or
  • YYYY-MM-DDTHH:MM:SSZ or
  • YYYY-MM-DDTHH:MM:SS+XX:YY - both time and timezone were specified.

This section is a stub, being filled in with additional feature description, parsing instructions etc. from value-excerption-pattern-brainstorming#date_and_time_separation and value-excerption-dt-separation-test

Parsing value from a title attribute

The value-title class name allows the publisher to indicate the data value for a parent property is contained in the title attribute of an element, rather than the inner-text.

This can be used to provide a synonym within content, or used to quietly publish alternate forms of information for microformats parsing, without affecting the consumption of content.

For example, you can use casual localization with dates:

<p>It was 
 <span class='dtstart'>
  <span class='value-title' title='2008'>last year</span>
 </span>
  that I realised my addiction to cashew nuts would cost this country so dear.
</p>

Parsing rules for value-title are the same as for value above, with the following change:

  • Where a microformats property has a child element with class name of value-title, the content of the title attribute of that element must be parsed, rather than the portion of the element that would be parsed for a class name of value.

Using value-title to publish machine-data

The initial usage of value-title is used to publish alternate, parsable forms of property values in a visible context without the use of the abbr element whose semantics already support interpretation of the 'title' attribute as an expanded, more precise form of the content.

Experience has found that there are some cases in microformats where a number of publishers want to include a precisely accurate and parsable value for a property but do not want it to be visible in their page, even as a tooltip.

For example, full ISO8601 datetimes may be confusing to readers of the page (as a tooltip or when read aloud by a screen reader), and enumerated values such as the type subproperty of hCard's tel property use US-English terms, which are not part of pages in any other language.

Since both of those scenarios have shown to be obstacles for a number of publishers, for these cases, and these alone, there exists a further extension of value-excerption. This extension allows the parsable form of the property to be published ‘silently’ immediately adjacent with the respective local visible content.

Here is an example, with the required use of a first child element with class name value-title:

<p class='tel' lang='en-gb'>
  <span class='type'>
    <span class='value-title' title='cell'> </span>
      mobile
    </span>
  <span class='value'>+44 7773 000 000</span>
</p>

The cell value is parsed for the 'type' subproperty, but mobile is presented to the user.

In the case of dates:

<p class='dtstart'>
  <span class='value-title' title='2009-03-14T16:28-0600'> </span>
  March 14th 2009, around half-past four
</p>

A microformats parser will read the ISO8601 format datetime 2009-03-14T16:28-0600, but users will only see March 14th 2009, around half-past four. Testing has shown that the ISO8601 datetime above does not get exposed to any user at all.

Parsing machine-data value-title

Browsers collapse the value-title span down to a width of 0, effectively providing no visual rendering, whilst keeping the element in the DOM. With no physical dimensions, there is no ‘hover’ state, so no tooltip is revealed. Furthermore, the empty element is not passed to assistive technology layers such as VoiceOver. Screen readers do not read the contents of the title attribute of an empty span element.

We conducted thorough testing of these parsing behaviors to ensure accessibility.

Note: Whilst the value-title element is more gracefully written without whitespace inner-text (or as self-closing <foo /> element in XHTML), current tools such as WYSIWYG editors and HTML-Tidy will erroneously discard such elements, resulting in parsable data being thrown away by some tools. As such, <span class='value-title'> </span>, including a single whitespace character between the opening and closing tag, is the required pattern for authors, at this time.

Parsing this final value-title extension imposes some stricter restrictions on usage. These restrictions exist to reduce the impact of DRY violations, reduce the opportunity for sites to spoof data, and encourage best practice for maintaining both forms of data accurately.

Where an element with class value-title is to be parsed as data for a property, and that element also contains no non-whitespace content (hereafter referred to as ‘empty’), the following rules apply:

  • The ‘empty’ value-title element must be the first, non-whitespace child of the property element. That is, it should follow immediately after the property is declared, before the human-readable form, and without any additional nesting.
  • The ‘empty’ value-title element can only be used for specific properties. Microformat specifications must explicitly state which properties may be used with this extension of the value-class-pattern.
  • Where an ‘empty’ value-title element is to be used as the single property value, it must be the only such value content. That is, the first instance of a conforming value-title element overrides all other value and value-title siblings and/or cousins.
  • Tools written to perform Conformance Testing and/or Validation of microformats should attempt to compare the machine-data and human legible forms of the property data, and advise authors if the forms do not match.

This document post-dates other microformat specifications, such that they may not yet indicate which properties are to be compatible with this pattern. In the interim, the properties documented on the machine-data page are to be considered normative.

TO DO: list types of properties here, rather than referencing machine-data normatively.

There are some simple reference examples and tests for this pattern on value-class-pattern-tests.

FAQ

  • Why use an 'empty' element? Why not embed data in the class attribute?
    • The class attribute is inappropriate for embedded data values, as per the HTML4 specification, which states class is for ‘general purposing processing’, which is defined as ‘e.g. for identifying fields when extracting data from HTML pages into a database, translating HTML documents into other formats, etc.’. ‘General purpose processing’ does not extend to data itself. Furthermore, this method avoids inventing a new string pattern for embedding data.
  • Why use an 'empty' element? Why not make up a new attribute, like ‘data’?
    • Microformats exist and function in valid HTML4 and XHTML1. Those are the current standards for web development, and microformats exist for use now. In the future, perhaps future revisions of HTML will offer up another solution. For now, this method has been tested against browsers, and creates a consistant document structure (where machine-form and human-form data are siblings).
  • The title attribute should only be used for content!
    • The title attribute _is_ used for content and is read by microformats parsers. This exists for cases where data cannot be parsed with sufficient precision from just the commonly published, visible information. This pattern allows both forms of content to be included, whilst keeping it invisible to human consumers.

You can also refer to the general Microformats FAQ and principles.

Related Pages