[uf-discuss] Appeal for Issues: Empty spans in value-excerption-pattern

Thu Nov 6 01:53:23 PST 2008

Hi everyone.

So, a few months ago I was working on the ongoing value-excerption- 
pattern specification. Then I moved to San Francisco and my work went  
a little stagnant, but I'm trying to pick it up again.

The value-excerption-pattern is an attempt to fully spec the  
class="value" behaviour from "tel" in hCard, which has since been  
supported globally in some parsers for a while, and has proved  
somewhat useful. In addition to fully spec'ing the behaviour for  
parsing class="value" elements for visible data, I've been working on  
additional specification to handle inclusion of machine-centric data  
alongside human forms (http://microformats.org/wiki/machine-data).

It's this machine-centic portion that I'm trying to nail down at the  
moment, since it would provide an in-demand solution for various  
recurring complaints (abbr-pattern dependencies, for example).

Also, note that recent brainstorming regarding patterns dervice from  
the semantics of the <object> element and value excerption has shown  
that current, in-use browsers (Microsoft Internet  Explorer and  
Apple's Safari 2) do not handle object acceptably for inline content (http://microformats.org/wiki/value-excerption-pattern-brainstorming#object_param_handling 
). So we're definitely stuck with needing to spec this pattern using  
generic mark-up. (http://microformats.org/wiki/value-excerption-pattern-brainstorming#object_param_handling 
)

Since it's been a while, this mail serves to summarise the current  
state of this spec and proposed resolutions to open issues. PLEASE, if  
you have additional issues to raise, add them to the wiki page (http://microformats.org/wiki/value-excerption-pattern-issues#Parsing_title_from_Empty_value_Elements 
)

Couple of Examples:
----------------------------

  <span class="dtstart"><span class="value"  
title="2008-08-27T23:25:00-0700"></span> 11:25pm, August 27th 2008</ 
span>

  <p class="tel">
    <span class="type"><span class="value" title="cell"></span>  
Mobile</span>
    <span class="value">415-123-4567</span>
  </p>

Purpose
-----------

This pattern allows you to embed fixed format content — such as the  
telephone type enumeration and parser-required data formats —  
alongside the visible format of the publisher's choice.

Responses to Issues so Far
--------------------------------------

1. DRY Violation worse than current ABBR-pattern. DRY is a problem  
when data is repeated in a document and risks one copy of the data not  
being maintained in sync with another. Maintenance of the document  
results in broken data.

Resolution: To address this, the empty-span part of the value  
excerption pattern will specify that the empty-span MUST be the first,  
non-whitespace-text-node child of the property element. Thus, this  
will parse:

  <span class="dtstart"><span class="value" title="2008-11-04"></ 
span>4th November</span>

But this will fail:

  <span class="dtstart">On 4th November 2008 Barack Obama was elected  
the first African American president of the United States of American.  
He was really pleased about it. <span class="value"  
title="2008-11-04"></span> </span>

The first pattern keeps the code distance small between the data form  
(class=value) and the property name (class=dtstart). It disallows the  
machine-data portion from being separated from the property.

Furthermore, the spec should encourage conformance checking tools to  
attempt to verify the machine date form against the human form and  
warn the user if they data does not match.

2. Violating the principal of visible data

Resolution: Microformats maintain a principal of marking up visible  
data. However, we have exceptional circumstances where the data  
required for parsing is not the data that publishers wish to display.  
Whilst parsers are a lower priority than publishers, the cost and  
complexity of parsing unstructured dates, or translated terms, is  
accepted as too high. Therefore it is necessary to violate DRY to  
include explicit representations for machines.

Currently authors may use CSS to hide the machine-form of dates.  
Microformats exists only in the HTML layer, and must not depend on CSS  
to meet publisher requirements.

The specification may also restrict this part of the pattern to  
certain properties where a machine-data form is required, as a means  
to discourage abuse.

3. Broken parsers drop empty elements

There are some broken but widespread HTML parsers which discard empty  
elements, resulting in the empty-span-value element being removed from  
documents (e.g. HTMLTIdy). HTMLTidy is easily patched not to do this,  
but may already exist in publishing platforms.

Resolution: Without numbers, we don't know how many publishing systems  
would be affected but this. It's a problem for which the only  
resolution is to use a completely different pattern. As such, this  
proposal must put legacy broken parsers down as an accepted loss.  
CMS's locked to old versions of HTML Tidy would not be able to use  
this pattern without modification.

So, there aren't many issues against this part of the pattern, and the  
rules for it are coming together. There's likely some feeling about  
matters of taste as to how to achieve this function. This is my  
favoured version, but a lot of the issues resolved here would apply  
equally to other patterns too, so I'd appreciate further input to see  
if this pattern can be thoroughly specified.

Please, if you have problems to raise with this proposal, add them to  
the -issues page on the wiki at:

http://microformats.org/wiki/value-excerption-pattern-issues#Parsing_title_from_Empty_value_Elements

Thank you,

Ben