[uf-discuss] mixing vocabularies

Wed Jul 8 08:00:10 PDT 2009

Hi,

10 SEC SUMMARY

split hRecipe vocabulary in an essential core set (mostly self-defined)
and additional 'suggestions' (all re-used from other vocabularies)

INTRO

I did some homework on the hRecipe page like correcting examples and
removing superfluous references, but now I'm trying to get back to my basic
problem: mixing of vocabularies. Re-reading all the posts in this thread I
see a basic consensus that:

* vocabularies should be constrained to a basic set of properties (80/20)
* for cases outside the 80/20 mixing vocabularies can be a valid approach
* mixing microformat vocabularies is principally legal
  and common practice as well
* although there are cases where mixing produces results
  that parsers cannot easily understand anymore
* some advice on best practices and common pitfalls would be helpful:
  which combinations do work very well, which do not work at all
* there is a difference between mixing and combining voacbularies
  (not sure how important that is).

PROBLEM

My problem with hRecipe is that the property set falls into three
categories: a core set of properties is used by nearly every recipe and also
is mostly quite specific to recipes, like "ingredient" and "instructions".
The other set of properties I'd like to call 'supplemental' or 'nice to
have', like "photo", "author", "summary" or "tag". They are not essential
for the functionality of a recipe - you can cook the recipe without the
publishing date, but not without the list of ingredients - but nonetheless
do make sense in one or the other way and get used quite often. A third set,
mainly review properties, are not added, because they rightfully form their
own microformat but nonetheless would be useful and are in broad use
together with hRecipe.

OBSERVATIONS

Interestingly I recognized a pattern that some of these 'supplemental'
properties get reused from other vocabularies rather then from hRecipe
itself. Since a lot of recipes I found where published on weblogs, the blog
software had taken care of the properties "author" and "published" anyway
and there was no need to add them to the recipe a second time - at least
that's how I interpret their usage.
Similar but different is the case of the title/name of a recipe: hRecipe
reuses "fn" and that's fine since it really is a core part of most recipes:
although the name is not functionally essential to cooking it is very
important because it is the "handle" to human readably define a specific
recipe. 
Then the rel-tag pattern is so self-contained and popular that it really
doesn't need to be mentioned explicitely by the recipe-vocabulary.
Finally a lot of recipes on the web get published as "user generated
content" on big recipe sites. An important part of the functionality and
popularity of these sites is that other users can (and do) review these
recipes through comments and ranking. Still in my view a recipe and it's
review are two very distinct items and I'd never want them mixed in one
vocabulary. Seems to me that it's for a reason that hReview is a vocabulary
on it's own.

COUNTING 

There is also that other thing: Tantek raised an issue about "Too many
Properties" in hRecipe, arguing that one should start with a property set as
small as possible. IMO smallish-ness is not a quality in itself: if a
vocabulary is too minimized it's not broadly useful anymore and consequently
won't be adopted because the effort doesn't provide enough return. The size
has to be not too small and not too big, but "just right" - whatever that
means ;-) An interim "solution" was that some properties got marked
"experimental" until further observations on "implementation and general
uptake" have been gathered. I recently investigated the usage of hRecipe
through a query on Searchmonkey (documented on the wiki and on this list in
june 09) with a result that could support both views:

 16   fn    
 17   ingredient (3 value, 3 type)
  3   yield 
 15   instructions 
  3   duration 
  4   photo
  9   summary
  4   author
  3   published
  1   nutrition (0 value, 0 type)
  2   tag

A few properties get used most of the time while a lot of properties get
used some times. Only nutrition get's a very low count (but see note below
*). I didn't count other properties but it ws obvious that a lot of recipes
had review sections added.

PROPOSAL: *CORE* AND *SUPPLEMENTAL* PROPERTIES

I would like to refactor hRecipe according to these observations. Only
properties that are
* very essentiel to the functionality of a recipe
* very commonly used with recipes
* very specific to recipes
should remain in the (CORE) property set. These would be:

      fn    
      ingredient
      yield 
      instructions 
      duration  
      nutrition    

Other properties that are very popular or make very much sense * and are
reused from other formats * should be added to the format documentation as
'supplemental' (preliminary wording - native speakers to the rescue, please)
together with some advice on how to mix them in technically and semantically
correct. These (SUPPLEMENTAL) properties would be:

      photo
      summary
      author
      published
      tag
      value, type as sub-properties (also hMeasure)
and   hReview as a new property (-set).

They would not become part of the vocabulary but be added as
proposed/possible additions in real-live usage of recipes. Parsers would not
be required (but expected?) to recognize them.

The more I think about it the more I like this approach. Sorry for the long
mail but I wonder if this couldn't be a generally useful way to modify reuse
of vocabulary terms, more flexible than microformats currently do . Before I
start formalizing it in a new draft - hRecipe 0.3 - I'd like to gather some
feedback! Any thoughts?

Cheers,
Thomas

P.S.: I'll switch my list membership to a private mail-adress soon. So
please don't be surprised about that new "thomas at stray.net" who seems to
share my point of view all the time...

(*) although, given the absolute percentages, it get's used quite at lot
because the big recipe sites use it and they make up the majority of recipes
on the web, while I only counted different sites - an approach which
favoured the blogs - and most blog authors won't know (or won't care?) how
to measure nutritional informations

.
Thomas Lörtsch
Gruner+Jahr, Hamburg, Germany
...
eMail: loertsch.thomas at guj.de -> 07/2009
                                 08/2009 -> thomas at stray.net