[uf-discuss] mixing vocabularies

Bob Douglas bd-net at sbcglobal.net
Fri Jul 3 07:50:44 PDT 2009


Hi, I'm still getting oriented to the list, apologies for the late response,
length, and any glaring naiveness.    Would post on the wiki, but not
familiar enough yet to understand where it would fit.  

This has been a helpful thread, but seems more guidance on handling mixed
context may be a looming issue on MediaWiki/Wikipedia type sites -
especially for users who will be introduced to microformats through Operator
(Firefox), Oomph (IE), or similar browser add-ons.   Two usage problems are:

  A. Corruption (or significant alteration) of results over time as
different editors insert microformat producing templates into wiki pages or
add microformats to existing (mw-)templates.
  
  B. Significant differences in behavior due to the core "rules" applied in
emerging microformat browser tools.           

Here are two examples from current (30JUN2009) Wikipedia articles:
1.  Two vcards/vevents on a page (http://en.wikipedia.org/wiki/Einstein)  

The Einstein bio page contains two (un-nested) infoboxes that both declare
vcard and vevent classes on the table as :  

  table class="infobox vcard vevent"
  td class="fn summary">Albert Einstein<....
  /table 

  table class="infobox biography vcard vevent"
  td class="fn summary">Albert Einstein<....
  /table

Oomph returns the resulting two contacts and two events - none of which are
valid (missing the required "n" and "dtstart" values).

Detector returns a single valid contact by extracting the "n" values
(family-name, given-name) per the spec from the single space delimited "fn"
string.  Luck in this case as most other bio pages include a middle name or
initial.  
     
Can't tell from this example if Detector is ignoring or merging the
duplicate vcard with identical "fn"/"n" values.  (Anybody know the logic?)
However, it does ignore the invalid events.  Here are the complete contents
Detector produces for the Einstein page:     

  BEGIN:VCARD
  PRODID:-//kaply.com//Operator 0.8//EN
  SOURCE:http://en.wikipedia.org/wiki/Einstein
  NAME:Albert Einstein - Wikipedia, the free encyclopedia
  VERSION:3.0
  N;CHARSET=UTF-8:Einstein;Albert;;;
  FN;CHARSET=UTF-8:Albert Einstein
  CATEGORIES;CHARSET=UTF-8:Württemberg/Germany (1879–96) Switzerland
(1901–55) Austria (1911–12) Germany (1914–33) United States
(1940–55),Ashkenazi  Jewish and German,Physics
  BDAY:1879-03-14
  UID:
  END:VCARD

2. vcards/vevents nested within another vcard/vevent on page
http://en.wikipedia.org/wiki/Edison

The Edison page includes a single biography infobox in which a microformat
producing template has been inserted for his two marriages.   Intuitively
this should simply be two events (three if you count Edison's birth as an
event) within the scope of a person's life.  What is produced is a strange
brew of nested microformat classes.  The structure is: 

  table class="infobox biography vcard vevent"
  td class="fn summary">Thomas Alva Edison<....

    span class="vcard"
    span class="vevent"
      span class="dtstart">1871</span
      span class="dtend">1885</span
    span class="fn org summary">Marriage: Mary Stilwell to Thomas
Edison</span
    span class="uid url"><a href="http....</span
    /span /span

    span class="vcard"
    span class="vevent"
      span class="dtstart">1886</span
      span class="dtend">1932</span
    span class="fn org summary">Marriage: Mina Edison to Thomas Edison</span
    span class="uid url"><a href="http....</span 
    /span /span

  /table

Oomph identifies the three events: the two marriages and the vevent on the
infobox table where the "summary" values are concatenated (Thomas Alva
EdisonMarriage: Mary Stilwell to Thomas EdisonMarriage: Mina Edison to
Thomas Edison).  None are valid as it is apparently unable to set "dtstart"
from  the values provided.  It also identifies three contacts treating the
vcard "fn" values in the same manner as "summary" on the vevents.   These
are also invalid due to omission of the required "n" values.     

Detector identifies only one contact for the page, though technically
invalid since "n" values cannot be automatically extracted from the three
names in "fn".  The main difference in behavior is that "fn" and "org"
values are set to the first occurrences.  One obnoxious result of the nested
vcard/vevent is that the marriage event description ("summary") is passed to
the top-level vcard in the "org" attribute.  The vcard returned by Operator
for the Edison page is:
  
  BEGIN:VCARD
  PRODID:-//kaply.com//Operator 0.8//EN
  SOURCE:http://en.wikipedia.org/wiki/Edison
  NAME:Thomas Edison - Wikipedia, the free encyclopedia
  VERSION:3.0
  N:;;;;
  ORG;CHARSET=UTF-8:Marriage: Mary Stilwell to Thomas Edison
  FN;CHARSET=UTF-8:Thomas Alva Edison
  ROLE;CHARSET=UTF-8:inventor, scientist, businessman
  BDAY:1847-02-11
  UID:
  END:VCARD

Yes, much of this has to do with the use of standards rather than the
standards themselves.  But seems there is significant incentive to influence
similar (and sensible) microformat behavior over time within leading
browsers (or add-ons).   Here are a couple comments/questions regarding the
microformat standards related to mixed vocabularies.    

1)  Seems the common practice of slapping "class= vcard vevent" on templates
will be troublesome in a wiki environment - can be too ambiguous for tools
to determine the intended semantic context. 

2) Seems to be a need to clarify default behaviors (rules) related to
context.   For example, how should a tool handle multiple values for "fn",
"org", "summary", etc within the same div/span of a vcard or vevent class:
concatenate (Oomph), first occurrence (Detector), last occurrence
(override), etc?  Example is vcard/event, but likely an issue for others
too.

Best regards,  Bob Douglas





More information about the microformats-discuss mailing list