[uf-discuss] mixing vocabularies

Tantek Çelik tantek at cs.stanford.edu
Sat Jul 4 15:37:45 PDT 2009


Replies to Thomas and Bob inline:


On Tue, Jun 30, 2009 at 3:43 AM, Thomas Loertsch<loertsch.thomas at guj.de> wrote:
> I'll try to copy (more of) the referenced conventions inline and make the
> page more self-contained. But that isn't always as easy as it sounds: e.g.
> "author" is reused from hAtom which itself states that "an Entry Author
> element MUST be encoded in an hCard", which - if I got it right - leads to
> the following construct:
>
>    <div class="hrecipe">
>        <p class="author vcard fn">Hans Wurst</p>
>
> That's not exactly self-containing and standalone. But searchmonkey would
> know how to parse it, right?

Currently Searchmonkey (and any other hCard parser should treat that
hCard as invalid).

Per the hCard FAQ you cannot combine "vcard" and "fn" classnames on
the same element:
http://microformats.org/wiki/hcard-faq#Can_you_mix_properties_and_the_root_class_name

However, as this question / implied proposal has come up many times,
and microformats does have the design principle of starting and making
solutions as simple as possible, I've added a "root only shorthand
syntax" proposal to hCard brainstorming for hCard 1.0.1 that would
allow specifying an hCard with only the root class name.

http://microformats.org/wiki/hcard-brainstorming#root_only_shorthand_syntax

E.g.: <p class="vcard">Hans Wurst</p>

Which would then imply the one required property "fn", which could
then be used to imply additional property values.

In short, this would allow the following (slightly simpler) markup for
the above example:

    <div class="hrecipe">
        <p class="author vcard">Hans Wurst</p>

For more details of how ROSS would work, or to note issues or suggest
improvements, please do not reply inline in email, and instead see and
add to:

http://microformats.org/wiki/hcard-brainstorming#root_only_shorthand_syntax


On Fri, Jul 3, 2009 at 7:50 AM, Bob Douglas<bd-net at sbcglobal.net> wrote:
> Hi, I'm still getting oriented to the list, apologies for the late response,
> length, and any glaring naiveness.    Would post on the wiki, but not
> familiar enough yet to understand where it would fit.

Hello and welcome Bob.

Here is a brief guide to where at least some content belongs/fits on the wiki:

http://microformats.org/wiki/put-it-on-the-wiki#where_to_put_what_on_the_wiki


> This has been a helpful thread, but seems more guidance on handling mixed
> context may be a looming issue on MediaWiki/Wikipedia type sites -
> especially for users who will be introduced to microformats through Operator
> (Firefox), Oomph (IE), or similar browser add-ons.   Two usage problems are:
>
>  A. Corruption (or significant alteration) of results over time as
> different editors insert microformat producing templates into wiki pages or
> add microformats to existing (mw-)templates.
>
>  B. Significant differences in behavior due to the core "rules" applied in
> emerging microformat browser tools.
>
> Here are two examples from current (30JUN2009) Wikipedia articles:
> 1.  Two vcards/vevents on a page (http://en.wikipedia.org/wiki/Einstein)

Thank you for providing a URL to a real world example.


> The Einstein bio page contains two (un-nested) infoboxes that both declare
> vcard and vevent classes on the table as :
>
>  table class="infobox vcard vevent"
>  td class="fn summary">Albert Einstein<....
>  /table
>
>  table class="infobox biography vcard vevent"
>  td class="fn summary">Albert Einstein<....
>  /table
>
> Oomph returns the resulting two contacts and two events - none of which are
> valid (missing the required "n" and "dtstart" values).

In particular the hCards seem to be fine (n is implied from fn, works
in Operator), and thus if you are seeing a problem with Oomph, it may
be a bug in Oomph.

I just created an oomph-issues page and summarized this problem - it
could probably use more details to help track down the problem:

http://microformats.org/wiki/oomph-issues


> Detector returns a single valid contact

I'm unfamiliar with the "Detector" microformats implementation - could
you add it to:

http://microformats.org/wiki/implementations


> by extracting the "n" values
> (family-name, given-name) per the spec from the single space delimited "fn"
> string.

Sounds like it is behaving correctly.


> Luck in this case as most other bio pages include a middle name or
> initial.

One or more example URLs for pages like that would help with refining
the resolutions to related hCard issues (when an fn includes a middle
name or initial and there is no n property).


> Can't tell from this example if Detector is ignoring or merging the
> duplicate vcard with identical "fn"/"n" values.  (Anybody know the logic?)
> However, it does ignore the invalid events.  Here are the complete contents
> Detector produces for the Einstein page:
>
>  BEGIN:VCARD
>  PRODID:-//kaply.com//Operator 0.8//EN

Ah - is "Detector" another name for the "Operator" Firefox extension?

http://microformats.org/wiki/operator


>  SOURCE:http://en.wikipedia.org/wiki/Einstein
>  NAME:Albert Einstein - Wikipedia, the free encyclopedia
>  VERSION:3.0
>  N;CHARSET=UTF-8:Einstein;Albert;;;
>  FN;CHARSET=UTF-8:Albert Einstein
>  CATEGORIES;CHARSET=UTF-8:Württemberg/Germany (1879–96) Switzerland
> (1901–55) Austria (1911–12) Germany (1914–33) United States
> (1940–55),Ashkenazi  Jewish and German,Physics
>  BDAY:1879-03-14
>  UID:
>  END:VCARD

This looks correct, except for the empty "UID", which simply shouldn't
be present - if this is actually happening, could you document the
specifics (URL, version(s) etc.) on the Operator issues page?

http://microformats.org/wiki/operator-issues


> 2. vcards/vevents nested within another vcard/vevent on page
> http://en.wikipedia.org/wiki/Edison
>
> The Edison page includes a single biography infobox in which a microformat
> producing template has been inserted for his two marriages.   Intuitively
> this should simply be two events (three if you count Edison's birth as an
> event) within the scope of a person's life.  What is produced is a strange
> brew of nested microformat classes.  The structure is:
>
>  table class="infobox biography vcard vevent"
>  td class="fn summary">Thomas Alva Edison<....
>
>    span class="vcard"
>    span class="vevent"
>      span class="dtstart">1871</span
>      span class="dtend">1885</span
>    span class="fn org summary">Marriage: Mary Stilwell to Thomas
> Edison</span
>    span class="uid url"><a href="http....</span
>    /span /span
>
>    span class="vcard"
>    span class="vevent"
>      span class="dtstart">1886</span
>      span class="dtend">1932</span
>    span class="fn org summary">Marriage: Mina Edison to Thomas Edison</span
>    span class="uid url"><a href="http....</span
>    /span /span
>
>  /table

The template biography infobox that you mention is producing that
markup appears to have a few problems, e.g.

"Marriage: Mina Edison to Thomas Edison"

is certainly not an "fn" nor an "org" of an hCard (though it sounds
like a valid "summary" of that event).


> Oomph identifies the three events: the two marriages and the vevent on the
> infobox table where the "summary" values are concatenated (Thomas Alva
> EdisonMarriage: Mary Stilwell to Thomas EdisonMarriage: Mina Edison to
> Thomas Edison).

That sounds like an Oomph bug as it is not properly respecting the
parsing boundary of the nested hCalendar event as specified in the
hCard parsing spec:

http://microformats.org/wiki/hcard-parsing#nested_hCards

Please add the description of this problem and note the nesting rule
from hCard parsing to oomph-issues:

http://microformats.org/wiki/oomph-issues


> None are valid as it is apparently unable to set "dtstart"
> from  the values provided.

That sounds like another Oomph issue to note - as the YYYY year values
are valid.


> Detector identifies only one contact for the page, though technically
> invalid since "n" values cannot be automatically extracted from the three
> names in "fn".  The main difference in behavior is that "fn" and "org"
> values are set to the first occurrences.  One obnoxious result of the nested
> vcard/vevent is that the marriage event description ("summary") is passed to
> the top-level vcard in the "org" attribute.  The vcard returned by Operator
> for the Edison page is:
>
>  BEGIN:VCARD
>  PRODID:-//kaply.com//Operator 0.8//EN
>  SOURCE:http://en.wikipedia.org/wiki/Edison
>  NAME:Thomas Edison - Wikipedia, the free encyclopedia
>  VERSION:3.0
>  N:;;;;
>  ORG;CHARSET=UTF-8:Marriage: Mary Stilwell to Thomas Edison
>  FN;CHARSET=UTF-8:Thomas Alva Edison
>  ROLE;CHARSET=UTF-8:inventor, scientist, businessman
>  BDAY:1847-02-11
>  UID:
>  END:VCARD

This sounds like Operator 0.8 may have a similar hCard nesting bug as
Oomph did with the hCalendar event.

Please add it to:

http://microformats.org/wiki/operator-issues

And note the link to
http://microformats.org/wiki/hcard-parsing#nested_hCards for proper
parsing behavior.



> 1)  Seems the common practice of slapping "class= vcard vevent" on templates
> will be troublesome in a wiki environment - can be too ambiguous for tools
> to determine the intended semantic context.

I tend to agree with that and wonder if that's something we can
somehow incorporate into hcard-authoring and/or hcalendar-authoring:

http://microformats.org/wiki/hcard-authoring
http://microformats.org/wiki/hcalendar-authoring


> 2) Seems to be a need to clarify default behaviors (rules) related to
> context.   For example, how should a tool handle multiple values for "fn",
> "org", "summary", etc within the same div/span of a vcard or vevent class:
> concatenate (Oomph), first occurrence (Detector), last occurrence
> (override), etc?  Example is vcard/event, but likely an issue for others
> too.

First occurrence for singleton properties per:

http://microformats.org/wiki/hcard-parsing#finding_hCard_properties

multiple occurrences for multi-valued properties should simply
generate multiple values.


> Best regards,  Bob Douglas

Thanks for the considerable feedback Bob, and please let me know if
you have suggestions for how we can improve the
discoverability/findability of various pages (and where to put stuff)
on the wiki.

Tantek

-- 
http://tantek.com/



More information about the microformats-discuss mailing list