value-class-pattern-brainstorming: Difference between revisions

From Microformats Wiki
Jump to navigation Jump to search
(→‎enabling more use of title attributes: Merged empty-element 'issue' into brainstorm.)
m (entry title, add extra content/headings in order to workaround a MediaWiki TOC generation/depth bug)
Line 1: Line 1:
<entry-title>value excerption pattern brainstorming</entry-title>
__TOC__
__TOC__
''The [[value-excerption-pattern]] is derived from [[hCard#Value_excerpting|value-excerpting]] in hCard. The precise parsing behavior is not yet finalized, so the pattern should be used only with extreme caution.''
''The [[value-excerption-pattern]] is derived from [[hCard#Value_excerpting|value-excerpting]] in hCard. The precise parsing behavior is not yet finalized, so the pattern should be used only with extreme caution.''
Line 120: Line 121:


====== Safari 2 Result ======
====== Safari 2 Result ======
* Safari 2 - Partial Pass†
* Safari 2 - Partial Pass†


Line 134: Line 134:
==== misconceptions ====
==== misconceptions ====
===== misunderstanding of authoring unfriendliness =====
===== misunderstanding of authoring unfriendliness =====
* not very hand-authoring friendly, compared to other proposals like: [[datetime-design-pattern#Machine-data_in_class|Machine data in class]]: <code>&lt;span class="type data-cell">Mobile:&lt;/span></code>, and [http://microformats.org/discuss/mail/microformats-discuss/2008-February/011583.html data prefix in titles]: <code>&lt;span class="type" title="data:cell">Mobile&lt;/span></code> [[User:TobyInk|TobyInk]]
* not very hand-authoring friendly, compared to other proposals like: [[datetime-design-pattern#Machine-data_in_class|Machine data in class]]: <code><nowiki><span class="type data-cell">Mobile:</nowiki><span></code>, and [http://microformats.org/discuss/mail/microformats-discuss/2008-February/011583.html data prefix in titles]: <code><nowiki><span class="type" title="data:cell">Mobile</span></nowiki></code> [[User:TobyInk|TobyInk]]
** It is even more hand-authoring unfriendly to introduce a new syntax, as "Machine data in class" does, and to some extent as "data prefix in titles does". Additional (especially new) syntax introduces far greater cognitive load to the author than a little bit more markup. [[User:Tantek|Tantek]]
** It is even more hand-authoring unfriendly to introduce a new syntax, as "Machine data in class" does, and to some extent as "data prefix in titles does". Additional (especially new) syntax introduces far greater cognitive load to the author than a little bit more markup. [[User:Tantek|Tantek]]


Line 147: Line 147:
====== disadvantages ======
====== disadvantages ======
* '''Invalid (X)HTML''' - although this pattern does make sense, it is worth noting that <code>&lt;param></code> is one of just a handful of HTML elements for which the <code>class</code> attribute is [http://www.w3.org/TR/REC-html40/struct/objects.html#h-13.3.2 not defined]. Use of this pattern will break validation unless a custom DTD is employed.
* '''Invalid (X)HTML''' - although this pattern does make sense, it is worth noting that <code>&lt;param></code> is one of just a handful of HTML elements for which the <code>class</code> attribute is [http://www.w3.org/TR/REC-html40/struct/objects.html#h-13.3.2 not defined]. Use of this pattern will break validation unless a custom DTD is employed.
 
====== advantages ======
None unique to this variant.
===== no previous iterations =====
==== inspiration ====
Ben Ward and Tantek Çelik decided to explore other markup possibilities beyond use of empty span elements in the hopes that more semantic alternatives could be found.


== details for handling specific property types ==
== details for handling specific property types ==
=== date and time separation ===
=== date and time separation ===
==== summary ====
==== summary ====
By specifying a more precise parsing of the use of "value" excerption inside all datetime properties (e.g. dtstart, dtend, published, updated etc.), dates and times can be marked up separately, thus reducing/minimizing (and potentially eliminating) the readability issues that come with compound ISO8601 datetimes.
By specifying a more precise parsing of the use of "value" excerption inside all datetime properties (e.g. dtstart, dtend, published, updated etc.), dates and times can be marked up separately, thus reducing/minimizing (and potentially eliminating) the readability issues that come with compound ISO8601 datetimes.

Revision as of 05:59, 14 January 2009

<entry-title>value excerption pattern brainstorming</entry-title>

The value-excerption-pattern is derived from value-excerpting in hCard. The precise parsing behavior is not yet finalized, so the pattern should be used only with extreme caution.

This brainstorming page is for exploring ideas related to specifying the value-excerption-pattern in more detail and ideas for special case handling of the value-excerption-pattern in combination with specific semantic HTML elements per those elements' particular semantics.

These are merely explorations for now, and should NOT be used in actual content publishing, nor implemented in any production code.

details for handling specific elements

object param handling

2008-08-23 Ben Ward and Tantek Çelik brainstormed the following possible special case markup handling for the use of the value-excerption-pattern with the <object> element. Modified 2008-08-26.

The following markup example documents one way the hCard tel property's type subproperty could be specified with the enumerated value of "cell" while providing the UK English "mobile" as the human visible object text contents:

<object class="type" lang="en-GB">
 <param name="value" value="cell" />
 mobile
</object>

summary

  • object element special case handling of value excerption. When a microformat (sub)property class name is specified on an object element, then value excerption handling is modified as follows:
  • first param with name attribute value. if the first child of the object is a param element, and that param element has name attribute value of "value", then use the value attribute value for the value for the microformat (sub)property class name specified on the object.
  • continue. if not, continue with existing value excerption handling, and microformat (sub)property parsing rules as currently best specified by hcard-parsing.

notes

Note that the param element does not have a 'class' attribute and thus its 'name' attribute (which has a compatible semantic) is used instead to invoke the value excerption pattern.

advantages
  • Greater semantic re-use. The use of the param element to specify a value for its object is in line with the param element's semantics. The semantic association between the object and the param element is defined in the HTML4 specification.
  • Less invention. This use of object param is superior to the use of a nested empty span element. The association of an empty span with its parent is a new semantic not previously defined in the HTML4 specification. Thus this use of object param markup better follows the principle of minimum invention as compared to nested empty span markup.
neutral
  • Similar violation of DRY to nested empty span.
disadvantages
  • Less human visible than abbr DRY violation. The contents/values of param elements are not exposed to the user of a browser, unlike the title attribute of abbr which, since it is commonly available as a hover tooltip, is more human visible, thus verifiable, than param.
  • DRY violation content divergence risk greater than abbr. With abbr, one element is used to express both a human visible string and the property value, thus tying these values closer together (thus reducing risk of divergence). With object param, two elements are used, and thus risk of divergence may be greater than the use of abbr. Possible mitigating techniques that would help keep the property value and the equivalent human visible string closer to each other, perhaps as close in the code as they are when using abbr:
    1. require param be first child of object
    2. require use of only one param child (allow other child elements)
    3. require exclusive use of object for value excerption i.e. no using the same object for an actual replaced object and a value excerption
    4. require "value" attribute be the last attribute specified on the param element
    5. require equivalent human visible text be placed immediately (allowing for whitespace) following the param
criticisms

to do

  • Browser testing. This code sample must be tested in various browsers to determine how they process and handle pages with such code
    1. determine which browsers to test (based on popularity, deployment, etc.)
    2. write a full sample test case using the above object param markup pattern and a complete hCard
    3. write a more complex sample test case with multiple uses of the object param markup pattern
    4. test do browsers properly display the UK English text "mobile"?
    5. test do browsers generate multiple browser (e.g. Webkit, Trident etc.) controls as they would for embedded frames (nested HTML objects)?
    6. determine any other tests
  • Parser implementability. Determine approximately how much work it would be to implement this special case object param support.
    • I've just added support for this. It took 19 bytes of code. TobyInk 01:24, 25 Aug 2008 (PDT)
  • Document in more detail. Assuming browser tests of a simple example pass (proper visible text displayed, page efficiency not compromised by additional control creation), document how to handle/parse this pattern in more detail. Iterate.

Browser Testing

Using the following simple, HTML4 hcard:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
    "http://www.w3.org/TR/html4/strict.dtd">

<title>&lt;object> value excerption pattern: hCard Telephone Type Test Case</title>

<body class="vcard">
    <h1 class="fn"><a class="url" href="http://ben-ward.co.uk">Ben Ward</a></h1>
    <p class="tel">
        <object class="type">
            <param name="value" value="cell">
            Mobile:
        </object>
        <span class="value">415-123-567</span>
    </p>
</body>
Results

A pass is to display a heading level one ‘Ben Ward’ with hyperlink, followed by a paragraph displaying the text ‘Mobile: 415-123-567’ Browsers selected based on YUI Graded Browser Support (August 2008), plus some others.

  • Opera 9.5 - Pass
  • Firefox 2, 3 - Pass
  • Microsoft Internet Explorer 5.2 (Mac) - Pass
  • Microsoft Internet Explorer 6 - Partial Pass†
  • Microsoft Internet Explorer 7 - Partial Pass†
  • Microsoft Internet Explorer 8 (beta) - Partial Pass†
  • Safari 3 - Pass
  • Safari 2 - *Fail* ††
  • † Internet Explorer 6–8 on Windows XP renders the correct text, but triggers an ActiveX security warning bar on the page load.
  • This is an error on behalf of IE/Windows. As the object has no type nor data attributes, it has nothing that would bind it to a specific ActiveX control, and therefore should not trigger a security warning bar. This bug should be reported, and the respective bug number referenced here. To do:
    • report bug to Microsoft to fix in IE8 at a minimum (and since it is a security related false positive, perhaps patch it in IE6 and IE7 as well)
    • ask Chris Wilson if there are any markup work-arounds to cause security alert to not happen for something that is not loading any ActiveX (2008-09-04 DONE. Tantek direct messaged Chris, pointed him to this browser testing section, asked him about any markup work arounds.)
    • try DECLARE attribute, e.g. <object class="type" declare="declare">visible text</object> to see if that makes IE/Windows not instantiate an ActiveX control and thus not trigger the warning.
      • IE6 and IE7 (as tested on virtual machines on Ben Ward's computer) still trigger the warning message, but do properly display "visible text". It appears this may happen for *any* inline markup use of the object element due to Microsoft's attempting to "avoid paying patent whores such as Eolas hundreds of millions of dollars for something they did not invent in the first place." This would seem to imply that we cannot depend on any inline use of object for any markup solutions. We should still specify what happens with object for parsing purposes, and then document the current/past implementation/browser problems with object on another page like object-warnings. Tantek 14:36, 23 Sep 2008 (PDT)
      • Safari 2 still puts an empty box in place of the object, and fails to display "visible text".
  • †† Safari 2 renders a default-sized white box (as if embedding an external control). It breaks layout and does not display the desired content.
Safari 2 Tweak

The example is tweaked as follows to affect Safari 2 rendering:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
    "http://www.w3.org/TR/html4/strict.dtd">

<title>&lt;object> value excerption pattern: hCard Telephone Type Test Case</title>

<body class="vcard">
    <h1 class="fn"><a class="url" href="http://ben-ward.co.uk">Ben Ward</a></h1>
    <p class="tel">
        <object data="data://" class="type">
            <param name="value" value="cell">
            Mobile:
        </object>
        <span class="value">415-123-567</span>
    </p>
</body>

A data="data://" URL attribute is added to the object element.

Safari 2 Result
  • Safari 2 - Partial Pass†

† Safari 2 renders the object correctly on first page load, *however*, upon using the browser ‘Refresh’ function, the object element reverts to the broken rendering described in the original test.

Current Conclusion
  • Safari 2 does not pass the test acceptably for this to be adopted as the only solution.
  • Internet Explorer's security warnings are irritating, but justifiably unacceptable. --BenWard 20:17, 26 Aug 2008 (PDT)
    • I concur. And while we can report a bug against IE/Windows in the hopes that it is fixed eventually (perhaps even in IE8 before it ships), as this problem has been fixed in Safari 3, it is doubtful that a bug report against Safari 2 would be fixed in an intermediate version. --Tantek 03:07, 27 Aug 2008 (PDT)

misconceptions

misunderstanding of authoring unfriendliness
  • not very hand-authoring friendly, compared to other proposals like: Machine data in class: <span class="type data-cell">Mobile:, and data prefix in titles: <span class="type" title="data:cell">Mobile</span> TobyInk
    • It is even more hand-authoring unfriendly to introduce a new syntax, as "Machine data in class" does, and to some extent as "data prefix in titles does". Additional (especially new) syntax introduces far greater cognitive load to the author than a little bit more markup. Tantek

previous iterations

2008-08-23
<object class="type" lang="en-GB">
 <param class="value" name="value" value="cell" />
 mobile
</object>
disadvantages
  • Invalid (X)HTML - although this pattern does make sense, it is worth noting that <param> is one of just a handful of HTML elements for which the class attribute is not defined. Use of this pattern will break validation unless a custom DTD is employed.
advantages

None unique to this variant.

no previous iterations

inspiration

Ben Ward and Tantek Çelik decided to explore other markup possibilities beyond use of empty span elements in the hopes that more semantic alternatives could be found.

details for handling specific property types

date and time separation

summary

By specifying a more precise parsing of the use of "value" excerption inside all datetime properties (e.g. dtstart, dtend, published, updated etc.), dates and times can be marked up separately, thus reducing/minimizing (and potentially eliminating) the readability issues that come with compound ISO8601 datetimes.

introductory example

The sentence:

 The weekly dinner is tonight at 6:30pm.

would be marked up as:

 The weekly dinner is <span class="dtstart"><abbr class="value" title="2008-06-24">tonight</abbr> 
 at <abbr class="value" title="18:30">6:30pm</abbr></span>.

advantages

  • re-uses the readable abbr-date-pattern
  • identifies a similarly readable abbr-time-pattern.
  • minimizes DRY violation distance, keeps machine data on exactly the same element as the respective human data
    • even better than abbr-datetime-pattern does, which, in practice from experience often required specifying the date in machine readable form on the human readable time (separate from the human readable date).
  • introduces no new class names - principle of minimal invention
  • introduces no new use of the class attribute - principle of minimal invention again
  • introduces no new syntax (see above about any publishing method that requires the author to think like a programmer being a non-starter, and introducing new syntax almost always requires authors to think like programmers).
  • and most importantly, introduces no dark data.

issues

Some potential issues were raised in IRC, and it helps to document/resolve them so that they are not brought up repeatedly.

  • Does this sufficiently address the concerns raised with the current use of abbr-pattern?
    1. The abbr-date-pattern, as documented and explained by Jeremy Keith is just fine (in contrast to the abbr-datetime-pattern).
    2. Similar to the abbr-date-pattern, this proposal implies/introduces the abbr-time-pattern, which is similarly acceptable.
    3. In addition, as long there is incremental improvement, we are making progress. It is more important to take small steps that we know will help some things, rather than try to take a big step that is more risky in the attempt to help more but may not actually do so (as most big changes don't), therefore "sufficiently" is a flawed way of evaluating incremental fixes.
  • Exposes data through tooltips. Separating into 2008-06-07 and 18:03 improves the ability for humans to consume the data, but still exposes data through tooltips and speech in formats that the publisher did not choose to use. --BenWard 04:52, 25 Jun 2008 (PDT)
    1. This is a feature, not a bug. By making the duplicated data at least *somewhat* visible (rather than fully invisible), effective data quality is increased due to the fact that the probability of the ISO8601 and locale-specific data getting out-of-sync is reduced because of the increased visibility (and therefore the increased inspectability and more eye-balls looking at/for problems effect).
    2. Workaround: if a site publisher wishes to customize the presentation of tooltips, they can do so with a nested span with title.
      • That proposes extraneous mark-up maintain some publisher's wish not to have a tool-tip in the first place. I object to a microformat pattern requiring an immediate work-around to meet publisher's desires. It goes against ‘Humans first…’. --BenWard 09:09, 30 Jun 2008 (PDT)
        1. Additional markup has nothing to do with "Humans first".
        2. Additional markup to work-around minor issues (e.g. CSS, cross-browser compatibility, etc.) is a well accepted modern web design practice. It's not ideal, but it is both accepted and widely practiced. With the use of <span> and <div> elements, it's also semantically neutral, therefore not a problem from that perspective either.
        3. Finally, it should not be our goal to try to satisfy *every* publisher, for that would make every microformat beholden to every publisher and contort the design of microformats in really poor ways. We must accept that not all publishers will adopt all microformats and that is ok. Our goal to incrementally increase the number of publishers that adopt microformats, not to try to satisfy each and every one.
      • You *have* to have a tooltip though. It’s not possible to *not* have a tooltip. Not great.--Julian Stahnke 12:58, 4 Sep 2008 (BST)
        • Doesn't a nested span with empty title attribute work? e.g. <span title="">...</span>
          • Nope, empty title on nested element didn’t override a title on the containing element when I tested in Safari 3 and Firefox 3. Browser would still display the title.
  • Semantic misuses of ABBR. That ‘tonight’ is ever a textual, human abbreviation of ‘2008-06-24’ is not accepted.
    1. Semantic stretch not misuse. It is a semantic abbreviation rather than a purely syntactical (character shortening) abbreviation, but it is an abbreviation in context nonetheless. Though this may stretch what may be commonly expected as an "abbreviation", the HTML4 spec does seem to allow some flexibility here (HTML 4.01 9.2.1 Phrase elements).
  • Maintaining proper sentences with the expanded form. It is not always possible to use this mark-up and maintain proper sentences with the expanded form. e.g. it's my <abbr class="bday" title="2005-06-20">birthday today</abbr>! becomes ‘it's my 2005-06-20!’. And thus audio rendition of such titles can be nonsensical - "The weekly dinner is two thousand and eight dash zero six dash twenty four at eighteen thirty."
    1. This can and should be addressed by improving authoring examples so that practices improve with experience.
  • Publishing practices and desires show us that authors are not willing to compromise the semantics of abbr. Phae 04:30, 27 Jun 2008 (PDT)
    1. Without specific citations of which authors and what specific issues they have, we are unable to address their issues.
    2. See also above - not our goal to satisfy *every* publisher, but rather to incrementally satisfy more and more. We must accept that there may be some authors we are unable to satisfy in the immediate/short-term.
  • That's getting pretty complicated
    • Much less complicated than inventing yet another syntax ( " { ... } " ???? ) that web authors would have to learn.
      • But it's all in one place, rather than spreading it out.
        • The spreading it out is what current content publishing practices do already! It is much more important to map the machine data as close to the existing publishing practice as possible, than to try to "put it all in one place". The "put it all in one place" way of thinking is why people ended up sticking so much invisible metadata in the head of the document, which we know fails.

content requirements

Some requirements which enhance both human readability, and machine parsability (best of both) :

  • date value excerpts MUST use hyphen separators. E.g. 2008-06-24. Not ok:20080624.
  • time value excerpts MUST use colon separators (seconds optional, implied :00 if absent). E.g. 18:30 or 18:30:00. Not ok:183000.
  • timezone value excerpts MUST use leading plus or minus and NO colon separator. E.g. -0700. Not ok:-07:00.

derivation

It's important to document the derivation/background of a brainstorm/proposal as it allows others to see some of the thinking that went into it, and avoid having to rediscuss alternatives already considered, and helps provide understanding as to why aspects of the design are as they are.

example with datetime

Here is a short code example:

 the weekly dinner is tonight at <span class="dtstart">2008-06-24T18:30</span>
example with abbr datetime

However that's not the easiest to read, nor do most people publish that as human visible text, so per the abbr-datetime pattern:

 the weekly dinner is tonight at <abbr class="dtstart" title="2008-06-24T18:30">6:30pm</abbr>

which has raised two issues:

  1. When "2008-06-24T18:30" is inspected by a human reading a tooltip, or spoken by a screen reader, it's not the most understandable thing (precise citation needed, perhaps an mp3 with screen reader used version info).
  2. There is a non-local violation of DRY (which IMHO is a worse problem, as it leads to worse data quality -Tantek). That is, the "date" information is now not only in the text twice (as it was before), but those two instances of the date information are not on the same element, which makes it worse. That is, "tonight" is in the prose, outside of the element with the precise date "2008-06-24".

In analysis of examples of event information on the web, the date and time are often published in separate elements, often for display purposes.

Thus it is this existing content publishing practice which leads to this brainstorm proposal, to essentially to introduce a date and time value excerption longhand.

(Initially Tantek's idea that he bounced off Jeremy Keith (similar idea conceived by Drew independently) was to introduce new classes "datevalue", "timevalue" and "tzvalue" for this purpose, but Bob Jonkman pointed out that HTML5's time parsing algorithm enables a single <time> element to contain dates or times (with or without timezone) without having to explicitly say whether the value contains dates or times (with or without timezone). Bob then proposed that thus all was needed was a single new "datetime" class name. This was the key realization that allowed minimal invention. Tantek pointed out that since from the type of property we already know it is a datetime, there was no need for even one new class name, that we could simply re-use "value" excerption, and simply more precisely specify the semantics/parsing in the case of datetime properties.)

example with new date and time value excerpts

Thus we markup the date and time separately, as value excerpts, using the abbr-date-pattern and an implied parallel abbr-time-pattern:

 The weekly dinner is <span class="dtstart"><abbr class="value" title="2008-06-24">tonight</abbr> 
 at <abbr class="value" title="18:30">6:30pm</abbr></span>.
separate subtrees

The proposal also allows setting the date and time in separate element subtrees as well, which may be necessary for some document structures:

 the weekly dinner is <span class="dtstart"><abbr class="value" title="2008-06-24">tonight</abbr></span> 
 at <span class="dtstart"><abbr class="value" title="18:30">6:30pm</abbr></span>.

Note the two instances of dtstart, one of which sets the date for the dtstart, and the other of which sets the time.

The idea being, when a parser sees a datetime property (e.g. dtstart) with a value excerpt, that it only "set" the component of its full value that is specified by the value excerpt (e.g. the date), and that if lacking a complete datetime, it continue to parse additional instances of that datetime property for the remaining component(s) (e.g. the time).

Of course this only works for singular properties, but fortunately all instances of datetime properties so far are singular, so this works.

  • hCard's rev is plural. TobyInk
    • can someone give a reference to this being the case? The RFC says "The value distinguishes the current revision of the information in this vCard for other renditions of the information." Does it make sense to have multiple REV dates in a single vCard?
      • The RFC is ambiguous as usual, but a contact card could conceivably have had several changes made to it, with a rev for each. ("Change logs" are fairly common on the web.) The hCard spec is fairly specific about which properties are singular and which are not, and rev is not included in the list of singular properties.
reusing date data for multiple datetime properties

This also provides a *very* convenient way to re-use the same date information for start and end, e.g. expanding the example:

 the weekly dinner is <span class="dtstart dtend"><abbr class="value" title="2008-06-24">tonight</abbr></span> 
 from <span class="dtstart"><abbr class="value" title="18:30">6:30</abbr></span> - 
 <span class="dtend"><abbr class="value" title="20:30">8:30pm</abbr></span>.

Note what just happened. we just eliminated another duplication of date information by reusing the start *date* information for the end *date* information and *only* specifying the end *time* information separately for the two properties.

Reducing the duplication (or triplication) of such data helps to reduce the chances of (even inadvertent) data corruption/drift/divergence among any duplicates.

time zones

There are a few choices for timezones.

  1. Simply include the time zone information as part of the time "value".
    E.g. <abbr class="value" title="18:30-0700">6:30pm</abbr>
  2. Or use another value excerpt for the timezone (was: introduce the class name "tzvalue")
    E.g. <abbr class="value" title="18:30">6:30pm</abbr> <abbr class="value" title="-0700">PDT</abbr>
  3. Or allow both and let web authors decide. This is the current leaning.
    • if web authors want to specify timezone as part of the time (first example above), they can,
    • or if web authors visibly publish the timezone separately (second example above), then they can mark that up.
    • or if web authors wish to omit timezone information, they can do so as well, as most do today. In practice this works fine, as it creates a "floating" time which works fine in far more than the 80/20.

(more to come, documenting from IRC logs)

discussion

Opening up a discussion section even though documentation from IRC logs is still in progress. :)

  • regarding the advantage of "and most importantly, introduces no dark data."
    • "Dark data" is sometimes what publishers *want* to publish. To use the example of TV schedules which kick started the renewed discussion in this area, publishers will often not want to display the date. For instance, if a page entitled "Tomorrow's TV" and containing 300 different programmes marked up with dtstart, it is superfluous to explicitly display the date for each one. With this proposed solution the include pattern could be used to include the date into each vevent, but a visible link to the date on each programme would simply be confusing. Sometimes it just makes sense to hide some of the information you're publishing as a microformat - because the information you want to make explicit to parsers can be inferred from context by humans, or is more appropriately displayed at a different level of granularity for machines and humans. TobyInk 14:26, 24 Jun 2008 (PDT)
      • It doesn't matter whether publishers *want* to publish dark data or not. Invisible data always leads to poorer quality data. Publishers publish all kinds of invisible metadata in the heads of documents etc. because they want to, but their desire doesn't stop the data from becoming obsolete, diverging from the actual visible data etc. The quality of the data matters more than any publishers wish(es) of publishing in a specific format, or in a hidden way. In the example you gave, using the include pattern in that way would not result in any visible links, but merely empty include anchors. It never makes any sense to actually hide "some of the information you're publishing as a microformat", because historically that always results in some loss of data quality over time and thus the microformats principle of visible data instead of invisible metadata. Tantek 14:32, 24 Jun 2008 (PDT)
        • All microformats hide some data. In the example <span class="tel">01632 960123</span>, the information that the long string of numbers represents a telephone number is invisible. And making it visible (Tel: <span class="tel">01632 960123</span>) violates DRY. It's just a matter of where to draw the line.
          • That statement makes the mistake of conflating *type* data and *content* data. "tel" is not content data, just as <p> is not content data. It's markup, indicating the type of the data. Markup (type data) being invisible to the user has worked just fine. Content (content data) being invisible to the user is the problem of dark data. Or rather, if you think that everything is data, then you really should be spending time developing in a system that is built on that assumption, e.g. RDF, rather than microformats, which are built on HTML, and the clear separation of type of data (HTML elements, microformats properties) and content data (inner text, text attribute values).
            • My point is that there isn't a distinction between the two, but a continuum. The choice of where to draw the line is never a clear one and always somewhat arbitrary. The vCard standard could quite easily have ended up with separate "TEL", "FAX" and "CELL" properties, in which case hCard would have ended up with <foo class="tel">, <bar class="fax"> and <baz class="cell">. Going the other way, they could have stored e-mail addresses as mailto: URLs, and then hCard would have <a class="url" href="mailto:quux@example.com">. They chose the way they did, and as a result in hCard the distinction between a mailto: URI and an http: URI is largely invisible (in most circumstances only obvious by looking at the status bar when hovering), but the distinction between a telephone number and a fax number is visible. But that wasn't the only possible (nor the only reasonable) outcome.

enabling more use of title attributes

parsing title from empty value elements

As a solution to the invisible data requirements sometimes presented by machine-data in microformats, a parsing rule is proposed where the value element is empty (contains no non-whitespace characters), the title attribute instead be parsed.

e.g. <span class="dtstart">Tuesday the 24th at 6pm <span class="value" title="20080624T180000+1000"></span>lt;/span>

Note that due to a quirk in parsers, this technique can already be used in some parsers where the empty value node is also an abbr element. That is semantically incorrect use of abbr, though.

resolution notes

  • This is parsable, but needs to be specced precisely.
  • Suggest restricting to instances where a single value element exists, e.g.
    • Disallow concatenation of multiple non-visible embedded values
    • Disallow embedded non-visible values from being appended to visible data.
  • This pattern exists to solve the machine-data problem, and restricting it more will discourage it being mis-used for hiding other, useful data
    • Restrict the value element to be the first-child (excluding white-space text-nodes) of a µf property element, forcing the data to be kept physically close to the µf property in code. Keeps data close, helps maintenance issues. Intends to alleviate some invisible data principal concerns.
    • Require that only the machine-data value and its human form be contained within the microformat property; the µf property should not include arbitrary date
      • Include something to effect of ‘parsers may attempt to validate the human form against the machine form’ (PHP has a human date parsing function, for example)
    • Restrict opt-in to specific properties. Do not allow it to be parsed globally. Fail parsers which implement it globally.
  • Regarding accessibility, have confirmation that the empty element will be ignored (Thank you: James Craig, Gez Lemon, Bruce Lawson): “JAWS and Window-Eyes announce the title attribute for an empty abbr is used and verbosity is set to ‘expand abbreviations’, but neither read the title attribute on an empty span.”

issues

  • Violates the microformats principle of visible data. Numerous previous efforts (e.g. markup in comments etc.) have walked down that path of "dark data" and failed in practice. We must hold ourselves to higher standards than any XML/RDF solution. It's part of what sets microformats apart from so many other failed efforts at data representation on the web. We must not go down the path of dark data. IMHO that principle is inviolable for microformats. Tantek
    • The approach here is that we have exceptional situations where we are requiring data to be duplicated for machines. They are exceptions which have existed in microformats since hCard, and this is a pattern to handle those exceptions and only those exceptions in response to the problems people have publishing them. The specification for this could be written to make it a per-property opt-in device, only for those properties identified above. This is not a ‘generic data embedding’ device and in line with the cited principals, should not be allowed to become one. --BenWard 05:17, 25 Jun 2008 (PDT)
    • An alternative, I suppose, would be to recognise all of the above data format examples as being in violation of the microformats principal, since authors are hiding them in favour of their own content. Every instance of fixed data formats in microformats that force authors to break the invisible data principal would need to be eliminated in favour of accessible, i18n compatible replacements, including those in hCard which are 1:1 mappings from vCard. We _could_ undertake that, but previous discussions (people being advised to misuse ABBR for translation of the vCard telephone types, for example) have already suggested that supporting the visible publishing is too complex. --BenWard 05:17, 25 Jun 2008 (PDT)
    • Additionally, the use of terms such as ‘dark data’ is inappropriate for this discussion, which is focused on functional, practical solutions to the identified problem. The term is emotive, and aggressive toward other, completely unrelated technologies (such as RDF) which is irrelevant to this solving this issue. Precisely, the machine-data in this technique is ‘non-visible machine-data’, and is being approached with specific regard to the microformats principal of design for humans first. --BenWard 05:34, 26 Jun 2008 (PDT)
  • Worsens the DRY violation by separating the human visible version and machine readable version into separate elements. Duplicate data itself is bad, but at least by keeping the duplicates local on the same element (as the existing abbr-pattern does), the risk of drift/divergence is reduced. The greater the distance in content of the duplicates, the greater the risk of drift/divergence, and thus the lower the quality of data. This has been illustrated by the divergence of invisible metadata in the head of a document versus the content in the body, and even more so across documents.
    • The machine-data form is kept as a sibling of the human form, and in distance in code, is not much further away than the data stored on a single elements title attribute. Further, the specification for this could demand the value element be placed as the _first child_ of the parent property, forcing it to be published immediately after the property element. --BenWard 05:17, 25 Jun 2008 (PDT)
  • Some parsers may strip empty inline elements (particularly those that run incoming HTML through Tidy to convert it into well-formed XML). A workaround may be to allow (or even require) hard white space (i.e. &nbsp;) within the element with class='value".
    • It is, however, trivial to patch and build Tidy not to do this (keeping empty elements where that element also has a class attribute). Parser writers need to feed back on whether using a custom build is impossible to their solution, but since Tidy can be made to work, the problem can likely be alleviated. Ben Ward has put up an experimental build of Tidy with patched element-dropping behavior here: tidy-microformats.zip
      • Tidy is not just used in parsers, but also by publishers, as part of CMSes, etc.
  • WYSIWYG editors - this might present a problem in these editors in terms of allowing the user to change the value of the title of an empty span. Some of them allow editing title attributes of some elements, namely <A>, <IMG> and not much more. Needs to be properly researched, but is this an issue? Should we care about default WYSIWYG deployments? Or leave that for microformats-aware WYSIWYG versions, they would allow editing this span somehow... [User:Andr3|Andr3]] 04:38, 7 Nov 2008 (GMT)

value-title

Numerous proposals over the years have advocated expanding the use of the title attribute beyond the abbr tag for storing microformat property values. One simple mechanism for doing so would be to introduce a new value excerption class name and rule.

valuetitle: before "normal" value excerption handling, first look (in the same manner as value-excerption) for the class name "valuetitle", if it is found, use the value of the title attribute on that element and do no further value excerption or other parsing for that property value.

E.g.

<span class="type"> <span class="valuetitle" title="cell">mobile</span> </span>

In addition to first looking for "valuetitle" where a parser would look for "value", it seems reasonable to also allow "valuetitle" on the property element itself in order to minimize the markup necessary, e.g.:

<span class="type" class="valuetitle" title="cell">mobile</span>

Naming reasoning/methodology: by using the prefix "value-" it is clear that this is part of the value excerption pattern. By using the suffix "-title", it is clear that the "title" attribute is involved. Thus the name "valuetitle" is a good mnemonic for its functionality. See related naming-principles.

This "valuetitle" variant was suggested on 2008-08-30 by Tantek in a discussion with Ben Ward, derived from the previous brainstorm over parsing of title from empty elements — this pattern could also be used with empty elements.

previous similar proposals

  • I believe there may have been a proposal for "usetitle"(link+citation needed) in the past that would function similarly. I think "valuetitle" is better than "usetitle" as "valuetitle" is more *descriptive*, i.e. meaning "the title is the value", as opposed to "usetitle", which is more *prescriptive*, i.e. "use the title". Tantek 08:13, 1 Sep 2008 (PDT)
    • Agreed. ‘Processing instruction’ classenames are ugly and undesirable. Class should always be descriptive of the content. --BenWard 05:42, 14 January 2009 (UTC)

related pages