[uf-discuss] re: HTML5 support

Tue Jul 20 12:55:38 PDT 2010

On Tue, Jul 20, 2010 at 3:25 AM, Philip Jägenstedt <philipj at opera.com> wrote:
> On Tue, 20 Jul 2010 06:05:06 +0200, Angelo Gladding <angelo at gladding.name>
> wrote:
>
>> Can an enlightened soul describe in which ways microdata is actually
>> superior to profiled poshformats?
>
> Microdata should be compared to the class attributes and the various
> patterns that microformats use, not any specific vocabulary.

Of course. Let me clarify. A `microformat` is a poshformat that has
undergone a relatively laborious process of research and brainstorming
to capture real world user requirements to make a minimal vocabulary
that can capture ~80% of current usage patterns. Microdata is a set of
rules governing a syntax. Hence my comparison of microdata to
poshformats, which are essentially microformats sans the due
diligence.

> The main benefit is that parsing becomes well-defined

Ain't that the truth.

> and simple.

Or is it? I wonder how different the two sets of supporting algorithms
might look face to face once fully documented and implemented.

The Microformats wiki makes the following comparison to microdata:

1. `itemprop` - is a more specific version of class, for field names.
2. `subject` - allows semantically linking within the page.
Conceptually similar to the include-pattern.
3. `itemref` - allows including properties elsewhere on the page that
are not descendants of itemscope. Takes space-separated ids (for
example itemref="address phone" would include the elements with
id="address" and id="phone"). Conceptually similar to the
include-pattern.
4. `content` - on the meta element can be used to include invisible
data that is not part of the content. As current browsers move meta
inside <head>, make sure to include via `itemref`. Conceptually
similar to the 'value-title' feature of the value-class-pattern.
5. `itemscope` - identifies blocks to be marked as structured data.
Conceptually similar to the mfo brainstorming.
6. `itemtype` - to specify the type for an item (for example:
itemtype="http://microformats.org/profile/hcard").

Distilled down:

1. @class
2/3. include-pattern/table-header-pattern
4. value-class-pattern
5. "mfo"
6. rel-profile

Sounds to me like the same sort of desire for absolute normativity
that [non-HTML5] XHTML once attempted to burden the entirety of
humanity with. Ironically, HTML5 has deprecated such a style in favor
of a seemingly more flexible Microformat-esque syntax.

- - -

<span itemscope itemtype="http://microformats.org/profile/hcard">
      <span itemprop="fn n">
             <span itemprop="given-name">George</span>
             <span itemprop="family-name">Washington</span>
      </span>
</span>

vs

<span class=hcard>George Washington</span>

- - -

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
<title>example</title>
</head>
<body>
<p>example</p>
</body>
</html>

vs

<!doctype html>
<title>example</title>
<p>example

> That's why it's possible to define a JavaScript API for accessing microdata
> items on a page, which makes the data useful to the page itself, not only
> external scrapers. It also makes it feasible to make browser features like "add to
> address book" or "add to calendar",

Considering your affiliation with Opera, what might I ask are your
feelings about Operator?

> which really isn't really practical with microformats when the
> data is hidden in class attributes together with everything else.

As I alluded to above I see this as a complete non-issue yet you are
most certainly not the first to bring it up. What am I missing?

>> Might a "humans first, machines second" CJKV internationalization of
>> `n` optimization be to analyze the contents of the `fn`'s @lang and
>> inner text and use either or both to better determine name order?
>
> The main problem with this is that due to lazy copy-pasting, lang="en" is
> often used even when the language isn't English. Also, in the case of e.g.
> Facebook, lang="en" would be correct for the page itself, but people's names
> aren't in English anyway.

Check out http://ja-jp.facebook.com/people/gong-ye-zhong/100000456401743

<html lang=ja>...<div class=vcard>...<a class=fn ... >宮野衆</a>...</div>

宮野 can log in today and, without any cooperation from Facebook, append
a U+200B (zero-width space [1]) to his first name (regardless of the
input taking the form of one or two boxes), and immediately reap the
benefits of such an `n` optimization without negatively affecting UI,
sort order, etc.

[1] http://en.wikipedia.org/wiki/Zero-width_space

> The only way to get it right is to ask the user both for the full name,
> given name and family name, something I haven't ever seen.

If you haven't seen it, then it isn't even a single way to get it
right -- another
byproduct of Microformats philosophy I believe. However, if optimizations
 can yield 80%+ positive results when viewed in aggregate I personally give
 a little bit of magic a big thumbs up.

> The most practical solution is to not guess at all, and I don't know
> of any negative effects of this.

I just see a tiny hint of dehumanization. ;)

-- 
Angelo Gladding
angelo at gladding.name