[uf-discuss] mixing vocabularies
Bob Douglas
bd-net at sbcglobal.net
Fri Jul 3 07:50:44 PDT 2009
Hi, I'm still getting oriented to the list, apologies for the late response,
length, and any glaring naiveness. Would post on the wiki, but not
familiar enough yet to understand where it would fit.
This has been a helpful thread, but seems more guidance on handling mixed
context may be a looming issue on MediaWiki/Wikipedia type sites -
especially for users who will be introduced to microformats through Operator
(Firefox), Oomph (IE), or similar browser add-ons. Two usage problems are:
A. Corruption (or significant alteration) of results over time as
different editors insert microformat producing templates into wiki pages or
add microformats to existing (mw-)templates.
B. Significant differences in behavior due to the core "rules" applied in
emerging microformat browser tools.
Here are two examples from current (30JUN2009) Wikipedia articles:
1. Two vcards/vevents on a page (http://en.wikipedia.org/wiki/Einstein)
The Einstein bio page contains two (un-nested) infoboxes that both declare
vcard and vevent classes on the table as :
table class="infobox vcard vevent"
td class="fn summary">Albert Einstein<....
/table
table class="infobox biography vcard vevent"
td class="fn summary">Albert Einstein<....
/table
Oomph returns the resulting two contacts and two events - none of which are
valid (missing the required "n" and "dtstart" values).
Detector returns a single valid contact by extracting the "n" values
(family-name, given-name) per the spec from the single space delimited "fn"
string. Luck in this case as most other bio pages include a middle name or
initial.
Can't tell from this example if Detector is ignoring or merging the
duplicate vcard with identical "fn"/"n" values. (Anybody know the logic?)
However, it does ignore the invalid events. Here are the complete contents
Detector produces for the Einstein page:
BEGIN:VCARD
PRODID:-//kaply.com//Operator 0.8//EN
SOURCE:http://en.wikipedia.org/wiki/Einstein
NAME:Albert Einstein - Wikipedia, the free encyclopedia
VERSION:3.0
N;CHARSET=UTF-8:Einstein;Albert;;;
FN;CHARSET=UTF-8:Albert Einstein
CATEGORIES;CHARSET=UTF-8:Württemberg/Germany (187996) Switzerland
(190155) Austria (191112) Germany (191433) United States
(194055),Ashkenazi Jewish and German,Physics
BDAY:1879-03-14
UID:
END:VCARD
2. vcards/vevents nested within another vcard/vevent on page
http://en.wikipedia.org/wiki/Edison
The Edison page includes a single biography infobox in which a microformat
producing template has been inserted for his two marriages. Intuitively
this should simply be two events (three if you count Edison's birth as an
event) within the scope of a person's life. What is produced is a strange
brew of nested microformat classes. The structure is:
table class="infobox biography vcard vevent"
td class="fn summary">Thomas Alva Edison<....
span class="vcard"
span class="vevent"
span class="dtstart">1871</span
span class="dtend">1885</span
span class="fn org summary">Marriage: Mary Stilwell to Thomas
Edison</span
span class="uid url"><a href="http....</span
/span /span
span class="vcard"
span class="vevent"
span class="dtstart">1886</span
span class="dtend">1932</span
span class="fn org summary">Marriage: Mina Edison to Thomas Edison</span
span class="uid url"><a href="http....</span
/span /span
/table
Oomph identifies the three events: the two marriages and the vevent on the
infobox table where the "summary" values are concatenated (Thomas Alva
EdisonMarriage: Mary Stilwell to Thomas EdisonMarriage: Mina Edison to
Thomas Edison). None are valid as it is apparently unable to set "dtstart"
from the values provided. It also identifies three contacts treating the
vcard "fn" values in the same manner as "summary" on the vevents. These
are also invalid due to omission of the required "n" values.
Detector identifies only one contact for the page, though technically
invalid since "n" values cannot be automatically extracted from the three
names in "fn". The main difference in behavior is that "fn" and "org"
values are set to the first occurrences. One obnoxious result of the nested
vcard/vevent is that the marriage event description ("summary") is passed to
the top-level vcard in the "org" attribute. The vcard returned by Operator
for the Edison page is:
BEGIN:VCARD
PRODID:-//kaply.com//Operator 0.8//EN
SOURCE:http://en.wikipedia.org/wiki/Edison
NAME:Thomas Edison - Wikipedia, the free encyclopedia
VERSION:3.0
N:;;;;
ORG;CHARSET=UTF-8:Marriage: Mary Stilwell to Thomas Edison
FN;CHARSET=UTF-8:Thomas Alva Edison
ROLE;CHARSET=UTF-8:inventor, scientist, businessman
BDAY:1847-02-11
UID:
END:VCARD
Yes, much of this has to do with the use of standards rather than the
standards themselves. But seems there is significant incentive to influence
similar (and sensible) microformat behavior over time within leading
browsers (or add-ons). Here are a couple comments/questions regarding the
microformat standards related to mixed vocabularies.
1) Seems the common practice of slapping "class= vcard vevent" on templates
will be troublesome in a wiki environment - can be too ambiguous for tools
to determine the intended semantic context.
2) Seems to be a need to clarify default behaviors (rules) related to
context. For example, how should a tool handle multiple values for "fn",
"org", "summary", etc within the same div/span of a vcard or vevent class:
concatenate (Oomph), first occurrence (Detector), last occurrence
(override), etc? Example is vcard/event, but likely an issue for others
too.
Best regards, Bob Douglas
More information about the microformats-discuss
mailing list