From tantek at cs.stanford.edu  Sat Jul  3 19:18:43 2010
From: tantek at cs.stanford.edu (=?UTF-8?Q?Tantek_=C3=87elik?=)
Date: Sat Jul  3 19:19:06 2010
Subject: [uf-discuss] 2 billion hCards! gathering material for a
	"microformats.org turns 5" blog post
Message-ID: <AANLkTikVQ7Mp-X1MSj8lYmG5sYaQuct6maSB85ZrLuR2@mail.gmail.com>

According to Yahoo! Search Monkey, there are now over 2 billion hCards
on the web:

http://search.yahoo.com/search?p=searchmonkey%3Acom.yahoo.page.uf.hcard

This is perhaps due to a few fairly large recent deployments:
* BrightKite.com - all venues and user profiles have hCard (millions)
* Gravatar - all profiles now have hCards (millions) - used on
WordPress.com etc.

Some additional recent news:
* microformats has 94% marketshare compared to alternatives (e.g.
RDFa) according to Google (announced at the Semantic Technology
conference)
 - http://www.readwriteweb.com/archives/google_semantic_web_push_rich_snippets_usage_grow.php
 - http://www.readwriteweb.com/images/richsnippets_june10b.jpg

I'm collecting these into material for "microformats.org turns 5" blog
post - additional suggestions welcome!

http://microformats.org/wiki/microformats-turns-5

-- 
http://tantek.com/
From jeremy at adactio.com  Mon Jul  5 07:32:54 2010
From: jeremy at adactio.com (Jeremy Keith)
Date: Mon Jul  5 07:33:01 2010
Subject: [uf-discuss] 2 billion hCards! gathering material for a
	"microformats.org turns 5" blog post
In-Reply-To: <AANLkTikVQ7Mp-X1MSj8lYmG5sYaQuct6maSB85ZrLuR2@mail.gmail.com>
References: <AANLkTikVQ7Mp-X1MSj8lYmG5sYaQuct6maSB85ZrLuR2@mail.gmail.com>
Message-ID: <4599FB06-4A53-4295-B62A-6FB165B41E5A@adactio.com>

Tantek asked:
> I'm collecting these into material for "microformats.org turns 5" blog
> post - additional suggestions welcome!

Well, this isn't huge in terms of numbers but it's something that makes my day to day work a whole lot smoother:

37 Signals have added hCards to Basecamp:
http://answers.37signals.com/basecamp/556-any-chance-of-adding-hcards

Jeremy

-- 
Jeremy Keith

a d a c t i o

http://adactio.com/


From andreluis.pt at gmail.com  Mon Jul  5 10:04:37 2010
From: andreluis.pt at gmail.com (=?ISO-8859-1?Q?Andr=E9_Lu=EDs?=)
Date: Mon Jul  5 10:04:41 2010
Subject: [uf-discuss] 2 billion hCards! gathering material for a 
	"microformats.org turns 5" blog post
In-Reply-To: <4599FB06-4A53-4295-B62A-6FB165B41E5A@adactio.com>
References: <AANLkTikVQ7Mp-X1MSj8lYmG5sYaQuct6maSB85ZrLuR2@mail.gmail.com>
	<4599FB06-4A53-4295-B62A-6FB165B41E5A@adactio.com>
Message-ID: <AANLkTik4Rjwp6Y0KA7o6RwuoCfI32SqLrgZUI3xhFV-Y@mail.gmail.com>

On 5 July 2010 15:32, Jeremy Keith <jeremy@adactio.com> wrote:
> Tantek asked:
>> I'm collecting these into material for "microformats.org turns 5" blog
>> post - additional suggestions welcome!

Tantek,

one minor detail that might be worth correcting... what yahoo!'s
searchmonkey says is that there are almost 2 bilion pages with hcards.
That means those pages have at least one card, thus we can assume the
number of hcards at large is far superior. ;)

One point I'd like to see addressed in such a post, if possible, is
the near future... Should we start pushing for an adaptation of all
microformats tools to support microdata from HTML5 as well? Promote
authors to write one *or* the other (microformats vs microdata)?

Cheers,
--
Andr? Lu?s
http://id.andr3.net


>
> Well, this isn't huge in terms of numbers but it's something that makes my day to day work a whole lot smoother:
>
> 37 Signals have added hCards to Basecamp:
> http://answers.37signals.com/basecamp/556-any-chance-of-adding-hcards
>
> Jeremy
>
> --
> Jeremy Keith
>
> a d a c t i o
>
> http://adactio.com/
>
>
>
> _______________________________________________
> microformats-discuss mailing list
> microformats-discuss@microformats.org
> http://microformats.org/mailman/listinfo/microformats-discuss
>

From ehs at pobox.com  Mon Jul  5 11:45:13 2010
From: ehs at pobox.com (Ed Summers)
Date: Mon Jul  5 11:45:20 2010
Subject: [uf-discuss] 2 billion hCards! gathering material for a 
	"microformats.org turns 5" blog post
In-Reply-To: <AANLkTikVQ7Mp-X1MSj8lYmG5sYaQuct6maSB85ZrLuR2@mail.gmail.com>
References: <AANLkTikVQ7Mp-X1MSj8lYmG5sYaQuct6maSB85ZrLuR2@mail.gmail.com>
Message-ID: <AANLkTikiITiC6PipPOPAfMNaWQV1viIkV8YCNrxV1AwA@mail.gmail.com>

On Sat, Jul 3, 2010 at 10:18 PM, Tantek ?elik <tantek@cs.stanford.edu> wrote:
> Some additional recent news:
> * microformats has 94% marketshare compared to alternatives (e.g.
> RDFa) according to Google (announced at the Semantic Technology
> conference)
> ?- http://www.readwriteweb.com/archives/google_semantic_web_push_rich_snippets_usage_grow.php
> ?- http://www.readwriteweb.com/images/richsnippets_june10b.jpg

Was it clear if Google's stats were comparing all microformat usage
with usage of only their particular rich snippet vocabulary [1]? I'd
be surprised if it was *all* RDFa vocabulary use, since that would
mean that Google are indexing all RDFa on the web. John Breslin asked
a similar question in the comments on that RWW post [2].

If it isn't clear, I'd probably refrain from citing the 94% market
share statistic in the microformats-turns-5 post. Although I guess
this sort of posturing is to be expected, and most people take it as a
given that "there are three kinds of lies: lies, damned lies, and
statistics.", especially in religious debates [3]

The 2 Billion statistic is astounding, considering there are an
estimated 1.8 Billion people online [3]. It makes me appreciate how
important efforts are to give people the ability identify, link, and
unlink their online identities [4].

//Ed

[1] http://rdf.data-vocabulary.org/rdf.xml
[2] http://www.readwriteweb.com/archives/google_semantic_web_push_rich_snippets_usage_grow.php#comment-219873
[3] There are three kinds of lies: lies, damned lies, and statistics."
[4] http://code.google.com/apis/opensocial/

From ehs at pobox.com  Mon Jul  5 11:46:47 2010
From: ehs at pobox.com (Ed Summers)
Date: Mon Jul  5 11:46:52 2010
Subject: [uf-discuss] 2 billion hCards! gathering material for a 
	"microformats.org turns 5" blog post
In-Reply-To: <AANLkTikiITiC6PipPOPAfMNaWQV1viIkV8YCNrxV1AwA@mail.gmail.com>
References: <AANLkTikVQ7Mp-X1MSj8lYmG5sYaQuct6maSB85ZrLuR2@mail.gmail.com>
	<AANLkTikiITiC6PipPOPAfMNaWQV1viIkV8YCNrxV1AwA@mail.gmail.com>
Message-ID: <AANLkTimp2m2FHdJPpjk9j-8uz80hpEEiE9gClnreLGPw@mail.gmail.com>

On Mon, Jul 5, 2010 at 2:45 PM, Ed Summers <ehs@pobox.com> wrote:
> [3] There are three kinds of lies: lies, damned lies, and statistics."

I meant:

[3] http://www.internetworldstats.com/stats.htm

//Ed
From pmika at yahoo-inc.com  Tue Jul  6 01:27:03 2010
From: pmika at yahoo-inc.com (Peter Mika)
Date: Tue Jul  6 01:27:51 2010
Subject: [uf-discuss] 2 billion hCards! gathering material for a
	"microformats.org turns 5" blog post
In-Reply-To: <AANLkTikiITiC6PipPOPAfMNaWQV1viIkV8YCNrxV1AwA@mail.gmail.com>
References: <AANLkTikVQ7Mp-X1MSj8lYmG5sYaQuct6maSB85ZrLuR2@mail.gmail.com>
	<AANLkTikiITiC6PipPOPAfMNaWQV1viIkV8YCNrxV1AwA@mail.gmail.com>
Message-ID: <4C32E8D7.7080705@yahoo-inc.com>

Hi Ed,

The comparison to the number of people online is misleading, because the 
microformat stats quoted (both the Google and Yahoo figures) include 
duplicate counting. One of my illustrative examples is 
news.stanford.edu, where the microformat annotation is in the template, 
and thus every single page has exactly the same microformat markup, i.e. 
the address of Stanford University.

To verify, try the query

searchmonkey:com.yahoo.page.uf.hcard site:stanford.edu

in Yahoo Search.

The second point to make is that RDFa usage is underreported by [1]. Compare

searchmonkey:com.yahoo.page.rdf.rdfa

with

searchmonkey:com.yahoo.page.uf.hcard

These indicate that there are 2.7B pages with RDFa compared to 2B pages 
with hCard. There are many caveats to these numbers, but they are more 
or less on equal footing.

Cheers,
Peter


[1] 
http://www.readwriteweb.com/archives/google_semantic_web_push_rich_snippets_usage_grow.php


Ed Summers wrote:
> On Sat, Jul 3, 2010 at 10:18 PM, Tantek ?elik <tantek@cs.stanford.edu> wrote:
>   
>> Some additional recent news:
>> * microformats has 94% marketshare compared to alternatives (e.g.
>> RDFa) according to Google (announced at the Semantic Technology
>> conference)
>>  - http://www.readwriteweb.com/archives/google_semantic_web_push_rich_snippets_usage_grow.php
>>  - http://www.readwriteweb.com/images/richsnippets_june10b.jpg
>>     
>
> Was it clear if Google's stats were comparing all microformat usage
> with usage of only their particular rich snippet vocabulary [1]? I'd
> be surprised if it was *all* RDFa vocabulary use, since that would
> mean that Google are indexing all RDFa on the web. John Breslin asked
> a similar question in the comments on that RWW post [2].
>
> If it isn't clear, I'd probably refrain from citing the 94% market
> share statistic in the microformats-turns-5 post. Although I guess
> this sort of posturing is to be expected, and most people take it as a
> given that "there are three kinds of lies: lies, damned lies, and
> statistics.", especially in religious debates [3]
>
> The 2 Billion statistic is astounding, considering there are an
> estimated 1.8 Billion people online [3]. It makes me appreciate how
> important efforts are to give people the ability identify, link, and
> unlink their online identities [4].
>
> //Ed
>
> [1] http://rdf.data-vocabulary.org/rdf.xml
> [2] http://www.readwriteweb.com/archives/google_semantic_web_push_rich_snippets_usage_grow.php#comment-219873
> [3] There are three kinds of lies: lies, damned lies, and statistics."
> [4] http://code.google.com/apis/opensocial/
>
> _______________________________________________
> microformats-discuss mailing list
> microformats-discuss@microformats.org
> http://microformats.org/mailman/listinfo/microformats-discuss
>   

From tantek at cs.stanford.edu  Wed Jul  7 02:25:38 2010
From: tantek at cs.stanford.edu (=?UTF-8?Q?Tantek_=C3=87elik?=)
Date: Wed Jul  7 02:26:02 2010
Subject: [uf-discuss] 2 billion hCards! gathering material for a 
	"microformats.org turns 5" blog post
In-Reply-To: <4C32E8D7.7080705@yahoo-inc.com>
References: <AANLkTikVQ7Mp-X1MSj8lYmG5sYaQuct6maSB85ZrLuR2@mail.gmail.com> 
	<AANLkTikiITiC6PipPOPAfMNaWQV1viIkV8YCNrxV1AwA@mail.gmail.com> 
	<4C32E8D7.7080705@yahoo-inc.com>
Message-ID: <AANLkTiky55g_xA0KsZjFiS35Bc01DcO-v4iQsoBNI_Ww@mail.gmail.com>

Jeremy,

> Well, this isn't huge in terms of numbers but it's something that makes my day to day work a whole lot smoother:
>
> 37 Signals have added hCards to Basecamp:
> http://answers.37signals.com/basecamp/556-any-chance-of-adding-hcards

This is great news! In the few times I've used Basecamp I remember
being quite frustrated by the lack of hCard support and simple person
info portability.  Great to see that 37 Signals has added hCards.


Peter,

On Tue, Jul 6, 2010 at 1:27 AM, Peter Mika <pmika@yahoo-inc.com> wrote:
> Hi Ed,
>
> The comparison to the number of people online is misleading, because the
> microformat stats quoted (both the Google and Yahoo figures) include
> duplicate counting. One of my illustrative examples is news.stanford.edu,
> where the microformat annotation is in the template, and thus every single
> page has exactly the same microformat markup, i.e. the address of Stanford
> University.

On the other hand, there are also numerous pages with multiple hCards
per page.  Directory listings, friends lists, about pages for
companies listing their executives etc.

The wiki has many such examples already:

http://microformats.org/wiki/hcard-examples-in-wild

There are certainly:
* multiple pages with the same hCard.
* pages with multiple hCards.

This was my experience with the microformats indexer we built at
Technorati back in the day.

It's hard to know how these average out.

You have to write a bunch more code (e.g. really good deduping etc.)
to figure it out.

Lacking that we should cite *pages* with hCards rather than total
hCards for the Search Monkey stat to be more accurate.


> The second point to make is that RDFa usage is underreported by [1]. Compare
>
> searchmonkey:com.yahoo.page.rdf.rdfa
>
> with
>
> searchmonkey:com.yahoo.page.uf.hcard
>
> These indicate that there are 2.7B pages with RDFa

I think this may be an errant number based on the way that Search
Monkey normalizes things internally to RDFa (because of an unfortunate
premature architectural decision that they then became stuck with - as
it was related to me by Paul Tarjan).

OR (and this deserves a little analysis)

Those pages don't actually all (if any?) contain RDFa.

Look at the first page of results.

E.g. Wordpress.org results don't have any RDFa.

View source and the only thing even remotely resembling you see is:

<meta property="fb:page_id" content="...">

- which is simply use of an invalid "property" attribute (in XHTML
1.0). The qname "fb:" is not defined anywhere.

This is not RDFa, this is simply a <meta> tag using a new (invalid)
syntax. That is, using "property" instead of the standard HTML 4.01
"name" attribute:

<meta name="fb:page_id" content="...">

Similarly with CNN.com, download.cnet.com, online.wsj.com.


OTOH, www.vistaprint.ca, digg.com, www.joomlart.com, www.webmd.com
don't even have "property" attributes. Who knows why they're listed in
that result page. No evidence of any RDFa on those pages.

www.metacafe.com does appear to define an "og" qname and use it in a
"property" attribute.

And that's it for the first page of results for that query
"searchmonkey:com.yahoo.page.rdf.rdfa" -

Only 1 out of 10 of at least the first page of results actually had
any RDFa - and that one was invisible <meta> data at that.

It does not appear that that query actually returns pages with rdfa,
for the most part not in any valid sense, nor in any sense of the
intent of marking up existing visible content with additional
attributes.

Perhaps a challenge could be posed - how many results of that query do
you have to look through before you find a legitimate "marking up
visible data" instance of RDFa?

In 4 pages of results (40) I only found 2 - and both were on the
Creative Commons site - not a big surprise given that Ben Adida is
both co-chair of RDFa WG and works for Creative Commons. But no
others.


It seems that RDFa usage is grossly exaggerated (by at least a factor
of 20) by the Yahoo Search Monkey
"searchmonkey:com.yahoo.page.rdf.rdfa" query.


> compared to 2B pages with
> hCard. There are many caveats to these numbers, but they are more or less on
> equal footing.

They're not even close (at least an order of magnitude difference), as
the above debunking of the RDFa results demonstrates.


Ed,

> Ed Summers wrote:
>>
>> On Sat, Jul 3, 2010 at 10:18 PM, Tantek ?elik <tantek@cs.stanford.edu>
>> wrote:
>>
>>>
>>> Some additional recent news:
>>> * microformats has 94% marketshare compared to alternatives (e.g.
>>> RDFa) according to Google (announced at the Semantic Technology
>>> conference)
>>> ?-
>>> http://www.readwriteweb.com/archives/google_semantic_web_push_rich_snippets_usage_grow.php
>>> ?- http://www.readwriteweb.com/images/richsnippets_june10b.jpg
>>>
>>
>> Was it clear if Google's stats were comparing all microformat usage
>> with usage of only their particular rich snippet vocabulary [1]? I'd
>> be surprised if it was *all* RDFa vocabulary use, since that would
>> mean that Google are indexing all RDFa on the web. John Breslin asked
>> a similar question in the comments on that RWW post [2].

This is an excellent question.

In particular the context (and numbers) of that slide appear to be
rich snippet specific - both for microformats and RDFa.

That is, comparing particular microformats for rich snippets, and
particular RDFa for rich snippets - 94% of the instances of markup for
rich snippets they found were done with microformats.

Good catch Ed, that's an important detail to call out.


Thanks everyone for the corrections and additions.  I've updated the
wiki accordingly:


http://microformats.org/wiki/microformats-turns-5


Please let me know if I've missed anything else - I'm going to go
ahead and write this up tomorrow morning.


Thanks,

Tantek


-- 
http://tantek.com/

From mail at tobyinkster.co.uk  Wed Jul  7 04:43:52 2010
From: mail at tobyinkster.co.uk (Toby Inkster)
Date: Wed Jul  7 04:53:20 2010
Subject: [uf-discuss] 2 billion hCards! gathering material for a 
	"microformats.org turns 5" blog post
In-Reply-To: <AANLkTiky55g_xA0KsZjFiS35Bc01DcO-v4iQsoBNI_Ww@mail.gmail.com>
References: <AANLkTikVQ7Mp-X1MSj8lYmG5sYaQuct6maSB85ZrLuR2@mail.gmail.com>
	<AANLkTikiITiC6PipPOPAfMNaWQV1viIkV8YCNrxV1AwA@mail.gmail.com>
	<4C32E8D7.7080705@yahoo-inc.com>
	<AANLkTiky55g_xA0KsZjFiS35Bc01DcO-v4iQsoBNI_Ww@mail.gmail.com>
Message-ID: <20100707124352.3f2a215f@miranda.g5n.co.uk>

On Wed, 7 Jul 2010 02:25:38 -0700
Tantek ?elik <tantek@cs.stanford.edu> wrote:

> E.g. Wordpress.org results don't have any RDFa.
>
> View source and the only thing even remotely resembling you see is:
> 
> <meta property="fb:page_id" content="...">
> 
> - which is simply use of an invalid "property" attribute (in XHTML
> 1.0). The qname "fb:" is not defined anywhere.

In the current RDFa 1.1 drafts, this is allowed, though its meaning is
not likely what the authors of this page intended. In 1.1, prefixes
which are not bound to anything are assumed to be absolute URIs.

The page at http://wordpress.org/ does actually contain 3 triples if
evaluated as RDFa 1.0, though they're each the result of RDFa
grandfathering in certain HTML 4/XHTML 1 semantics.

The question "how many pages contain RDFa?" is only meaningful if
certain qualifications are added... Does broken RDFa count? Do
grandfathered rel/rev values count? &c.

In fact, "how many pages" questions about the Web are not especially
meaningful. Say Google added an hCard to its search result pages,
replacing its current logo with something like this:

	<span class="vcard">
		<a href="/" class="url">
			<img class="logo fn org"
			alt="Google" src="..." />
		</a>
	</span>

Are the search results for "foo" and "bar" different pages? What about
the search results for "100000000001" and "100000000002"? Because if
they are, that's over a hundred billion hCards online.

-- 
Toby A Inkster
<mailto:mail@tobyinkster.co.uk>
<http://tobyinkster.co.uk>

From tantek at cs.stanford.edu  Wed Jul  7 08:24:52 2010
From: tantek at cs.stanford.edu (=?UTF-8?Q?Tantek_=C3=87elik?=)
Date: Wed Jul  7 08:33:01 2010
Subject: [uf-discuss] 2 billion hCards! gathering material for a 
	"microformats.org turns 5" blog post
In-Reply-To: <20100707124352.3f2a215f@miranda.g5n.co.uk>
References: <AANLkTikVQ7Mp-X1MSj8lYmG5sYaQuct6maSB85ZrLuR2@mail.gmail.com> 
	<AANLkTikiITiC6PipPOPAfMNaWQV1viIkV8YCNrxV1AwA@mail.gmail.com> 
	<4C32E8D7.7080705@yahoo-inc.com>
	<AANLkTiky55g_xA0KsZjFiS35Bc01DcO-v4iQsoBNI_Ww@mail.gmail.com> 
	<20100707124352.3f2a215f@miranda.g5n.co.uk>
Message-ID: <AANLkTikGOIAGSveZGK33tFj2jgyiLpM52oKd1AwwJtK1@mail.gmail.com>

On Wed, Jul 7, 2010 at 4:43 AM, Toby Inkster <mail@tobyinkster.co.uk> wrote:
> On Wed, 7 Jul 2010 02:25:38 -0700
> Tantek ?elik <tantek@cs.stanford.edu> wrote:
>
>> E.g. Wordpress.org results don't have any RDFa.
>>
>> View source and the only thing even remotely resembling you see is:
>>
>> <meta property="fb:page_id" content="...">
>>
>> - which is simply use of an invalid "property" attribute (in XHTML
>> 1.0). The qname "fb:" is not defined anywhere.
>
> In the current RDFa 1.1 drafts, this is allowed, though its meaning is
> not likely what the authors of this page intended. In 1.1, prefixes
> which are not bound to anything are assumed to be absolute URIs.

So it's another form of invalid syntax then, since "fb:" is not a
defined protocol.


> The page at http://wordpress.org/ does actually contain 3 triples if
> evaluated as RDFa 1.0, though they're each the result of RDFa
> grandfathering in certain HTML 4/XHTML 1 semantics.

No, it might contain 3 RDF triples - but they're not RDF*a*.

Just because a page can be parsed/converted into another format does
not mean it "contains" that format.

Saying so is deceptively mis-using the word "contains" at best, and
playing semantic games at worst.

Just because a page has hAtom does not mean it "contains" Atom.

Just because a page has microdata does not mean it "contains" JSON
(though an exceptionally precise direct conversion is defined). etc.

Similarly to microdata, as we define more precise parsing rules for
microformats, we'll have direct conversions to JSON and RDF triples as
well.  This does not mean that all pages with microformats "contain"
JSON or RDF.

The question of comparison is deliberately chosen to illuminate what
are developers actually coding? What syntax? Not what can you "infer",
"parse as", or "convert to".

Because as you know with the parsers you've written, you can convert
syntaxes to nearly any implied format - it tells you nothing about
usage.


> The question "how many pages contain RDFa?" is only meaningful if
> certain qualifications are added... Does broken RDFa count?

broken RDFa counts, but only to demonstrate the difficulty of coding
RDFa, not instances of RDF(a). one of the reasons that Google found so
little RDFa is may be because much of it was broken. this is one of
the common problems with namespaces in data.

> Do
> grandfathered rel/rev values count? &c.

rel/rev syntax and values work without RDFa - they're not RDFa,
despite RDFa's attempt to subsume them (and even errantly claim/imply
credit in the spec, e.g. rel-license).


> In fact, "how many pages" questions about the Web are not especially
> meaningful. Say Google added an hCard to its search result pages,
> replacing its current logo with something like this:
>
> ? ? ? ?<span class="vcard">
> ? ? ? ? ? ? ? ?<a href="/" class="url">
> ? ? ? ? ? ? ? ? ? ? ? ?<img class="logo fn org"
> ? ? ? ? ? ? ? ? ? ? ? ?alt="Google" src="..." />
> ? ? ? ? ? ? ? ?</a>
> ? ? ? ?</span>
>
> Are the search results for "foo" and "bar" different pages? What about
> the search results for "100000000001" and "100000000002"? Because if
> they are, that's over a hundred billion hCards online.

1. theoretical strawman[1]
2. google.com/robots.txt prevents this from counting in any "search"


Tantek

[1] http://en.wikipedia.org/wiki/Straw_man

-- 
http://tantek.com/

From thomas at stray.net  Wed Jul  7 10:10:07 2010
From: thomas at stray.net (=?iso-8859-1?Q?thomas_l=F6rtsch?=)
Date: Wed Jul  7 10:10:19 2010
Subject: [uf-discuss] 2 billion hCards! gathering material for a
	"microformats.org turns 5" blog post
In-Reply-To: <AANLkTikGOIAGSveZGK33tFj2jgyiLpM52oKd1AwwJtK1@mail.gmail.com>
References: <AANLkTikVQ7Mp-X1MSj8lYmG5sYaQuct6maSB85ZrLuR2@mail.gmail.com>
	<AANLkTikiITiC6PipPOPAfMNaWQV1viIkV8YCNrxV1AwA@mail.gmail.com>
	<4C32E8D7.7080705@yahoo-inc.com>
	<AANLkTiky55g_xA0KsZjFiS35Bc01DcO-v4iQsoBNI_Ww@mail.gmail.com>
	<20100707124352.3f2a215f@miranda.g5n.co.uk>
	<AANLkTikGOIAGSveZGK33tFj2jgyiLpM52oKd1AwwJtK1@mail.gmail.com>
Message-ID: <A6AB5207-03CA-4891-B104-E9340FA311BC@stray.net>


On Jul 7, 2010, at 6:24 PM, Tantek ?elik wrote:

> On Wed, Jul 7, 2010 at 4:43 AM, Toby Inkster <mail@tobyinkster.co.uk> wrote:
>> On Wed, 7 Jul 2010 02:25:38 -0700
>> Tantek ?elik <tantek@cs.stanford.edu> wrote:
>> 
>>> E.g. Wordpress.org results don't have any RDFa.
>>> 
>>> View source and the only thing even remotely resembling you see is:
>>> 
>>> <meta property="fb:page_id" content="...">
>>> 
>>> - which is simply use of an invalid "property" attribute (in XHTML
>>> 1.0). The qname "fb:" is not defined anywhere.
>> 
>> In the current RDFa 1.1 drafts, this is allowed, though its meaning is
>> not likely what the authors of this page intended. In 1.1, prefixes
>> which are not bound to anything are assumed to be absolute URIs.
> 
> So it's another form of invalid syntax then, since "fb:" is not a
> defined protocol.
> 
> 
>> The page at http://wordpress.org/ does actually contain 3 triples if
>> evaluated as RDFa 1.0, though they're each the result of RDFa
>> grandfathering in certain HTML 4/XHTML 1 semantics.
> 
> No, it might contain 3 RDF triples - but they're not RDF*a*.
> 
> Just because a page can be parsed/converted into another format does
> not mean it "contains" that format.
> 
> Saying so is deceptively mis-using the word "contains" at best, and
> playing semantic games at worst.
> 
> Just because a page has hAtom does not mean it "contains" Atom.
> 
> Just because a page has microdata does not mean it "contains" JSON
> (though an exceptionally precise direct conversion is defined). etc.
> 
> Similarly to microdata, as we define more precise parsing rules for
> microformats, we'll have direct conversions to JSON and RDF triples as
> well.  This does not mean that all pages with microformats "contain"
> JSON or RDF.
> 
> The question of comparison is deliberately chosen to illuminate what
> are developers actually coding? What syntax? Not what can you "infer",
> "parse as", or "convert to".
> 
> Because as you know with the parsers you've written, you can convert
> syntaxes to nearly any implied format - it tells you nothing about
> usage.
> 
> 
>> The question "how many pages contain RDFa?" is only meaningful if
>> certain qualifications are added... Does broken RDFa count?
> 
> broken RDFa counts, but only to demonstrate the difficulty of coding
> RDFa, not instances of RDF(a). one of the reasons that Google found so
> little RDFa is may be because much of it was broken. this is one of
> the common problems with namespaces in data.

does broken tantek count?
this "my format is longer than your format" strikes me as rather silly.
50 million elvis fans can't be wrong (most of them use neither).

regards
thomas l?rtsch


> 
>> Do
>> grandfathered rel/rev values count? &c.
> 
> rel/rev syntax and values work without RDFa - they're not RDFa,
> despite RDFa's attempt to subsume them (and even errantly claim/imply
> credit in the spec, e.g. rel-license).
> 
> 
>> In fact, "how many pages" questions about the Web are not especially
>> meaningful. Say Google added an hCard to its search result pages,
>> replacing its current logo with something like this:
>> 
>>       <span class="vcard">
>>               <a href="/" class="url">
>>                       <img class="logo fn org"
>>                       alt="Google" src="..." />
>>               </a>
>>       </span>
>> 
>> Are the search results for "foo" and "bar" different pages? What about
>> the search results for "100000000001" and "100000000002"? Because if
>> they are, that's over a hundred billion hCards online.
> 
> 1. theoretical strawman[1]
> 2. google.com/robots.txt prevents this from counting in any "search"
> 
> 
> Tantek
> 
> [1] http://en.wikipedia.org/wiki/Straw_man
> 
> -- 
> http://tantek.com/
> 
> _______________________________________________
> microformats-discuss mailing list
> microformats-discuss@microformats.org
> http://microformats.org/mailman/listinfo/microformats-discuss
> 


From hober0 at gmail.com  Wed Jul  7 15:12:42 2010
From: hober0 at gmail.com (Edward O'Connor)
Date: Wed Jul  7 15:13:11 2010
Subject: [uf-discuss] patches (speaking of "microformats.org turns 5")
Message-ID: <AANLkTimHLaMwEchsLB_J0Mcb3qVvkVBU43PEjeZBOu75@mail.gmail.com>

Tantek wrote:
> I'm collecting these into material for "microformats.org turns 5" blog
> post - additional suggestions welcome!

I'm in the process of ordering a bunch of sew-on Microformats patches
(about 2.5" square). I'll let the list know when they're ready!


Ted
From info at csarven.ca  Wed Jul  7 15:53:08 2010
From: info at csarven.ca (Sarven Capadisli)
Date: Wed Jul  7 15:53:25 2010
Subject: [uf-discuss] 2 billion hCards! gathering material for a
	"microformats.org turns 5" blog post
Message-ID: <1278543188.1603.42.camel@csarven-netbook>

On Sat, 2010-07-03 at 19:18 -0700, Tantek ?elik wrote:
> According to Yahoo! Search Monkey, there are now over 2 billion hCards
> on the web:
> 
> http://search.yahoo.com/search?p=searchmonkey%
> 3Acom.yahoo.page.uf.hcard
> 
> This is perhaps due to a few fairly large recent deployments:
> * BrightKite.com - all venues and user profiles have hCard (millions)
> * Gravatar - all profiles now have hCards (millions) - used on
> WordPress.com etc.
> 
> Some additional recent news:
> * microformats has 94% marketshare compared to alternatives (e.g.
> RDFa) according to Google (announced at the Semantic Technology
> conference)
>  -
> http://www.readwriteweb.com/archives/google_semantic_web_push_rich_snippets_usage_grow.php
>  - http://www.readwriteweb.com/images/richsnippets_june10b.jpg
> 
> I'm collecting these into material for "microformats.org turns 5" blog
> post - additional suggestions welcome!
> 
> http://microformats.org/wiki/microformats-turns-5
> 
> -- 
> http://tantek.com/

I'm not sure about exact numbers, but a StatusNet instance (e.g.,
http://identi.ca/ ), has hCards for all users and groups. It includes
representative hCards.

Updated wiki.

-Sarven


From tantek at cs.stanford.edu  Thu Jul  8 01:25:03 2010
From: tantek at cs.stanford.edu (=?UTF-8?Q?Tantek_=C3=87elik?=)
Date: Thu Jul  8 01:25:27 2010
Subject: [uf-discuss] 2 billion hCards! gathering material for a 
	"microformats.org turns 5" blog post
In-Reply-To: <20100708002838.1b370e8b@miranda.g5n.co.uk>
References: <AANLkTikVQ7Mp-X1MSj8lYmG5sYaQuct6maSB85ZrLuR2@mail.gmail.com> 
	<AANLkTikiITiC6PipPOPAfMNaWQV1viIkV8YCNrxV1AwA@mail.gmail.com> 
	<4C32E8D7.7080705@yahoo-inc.com>
	<AANLkTiky55g_xA0KsZjFiS35Bc01DcO-v4iQsoBNI_Ww@mail.gmail.com> 
	<20100707124352.3f2a215f@miranda.g5n.co.uk>
	<AANLkTikGOIAGSveZGK33tFj2jgyiLpM52oKd1AwwJtK1@mail.gmail.com> 
	<20100708002838.1b370e8b@miranda.g5n.co.uk>
Message-ID: <AANLkTilMDr8tai3GAzbrtB-jLBroxkyuFkitdMJneo7E@mail.gmail.com>

Toby,

On Wed, Jul 7, 2010 at 4:28 PM, Toby Inkster <tai@g5n.co.uk> wrote:
> On Wed, 7 Jul 2010 08:24:52 -0700
> Tantek ?elik <tantek@cs.stanford.edu> wrote:
>

<snip academic discussion of fb: being a URL scheme or not>

>> > The page at http://wordpress.org/ does actually contain 3 triples i
>> > evaluated as RDFa 1.0, though they're each the result of RDFa
>> > grandfathering in certain HTML 4/XHTML 1 semantics.
>>
>> No, it might contain 3 RDF triples - but they're not RDF*a*.
>
> It contains three attributes which are described by the XHTML+RDFaspec,
> and which, when processed according to the RDFa spec, each produce an
> triple.
>
>> Just because a page can be parsed/converted into another format does
>> not mean it "contains" that format.
>
> The page at http://wordpress.org/ doesn't need to be converted to RDFa.
> It is RDFa. (It doesn't use an RDFa DTD, though many seem to believe
> that judging an XML document's type by its DTD is a layering violation.)
>
> It would need to be converted if you wanted RDF/XML, Turtle or JSON.
> But it doesn't need to be converted to RDFa; it is RDFa.


These assertions of "is RDFa" on grandfathered formats/syntaxes are
deceptive because it's essentially claiming implied credit/branding
for something that had nothing to do with RDFa.

E.g. if some future version of XHTML+RDFa spec describes how to
process microformats (given the trend the RDFa specs to grandfather in
more and more syntax - it's reasonable to predict that this happen),
then you can make the same claim there, that all use of microformats
are RDFa, which then dilutes the phrase "is RDFa" to the point of
meaninglessness.

Such a conflation of reclassifying previously non-RDFa markup as RDFa
is, as I said, clouding a definition at best, and deceptive/dishonest
at worst.

It still just conversion of a *previous* syntax, defined *outside* and
*predating* RDFa.

Another analogy: you could make a new spec called BrandXSemantics
(BXS) that defined processing of all syntaxes like meta tags,
microformats, RDFa, microdata etc. that claimed that all such syntaxes
were BXS, but such a claim is of little utility and would merely serve
to artificially inflate claims about BXS being more popular that
microformats or RDFa or microdata - this is essentially what this kind
of "grandfathering" in RDFa is doing.

Claiming "It is RDFa" is also deceptive from the point of view of
developer behavior, which is illustrated by your next point.


>> Saying so is deceptively mis-using the word "contains" at best, and
>> playing semantic games at worst.
>>
>> Just because a page has hAtom does not mean it "contains" Atom.
>
> No, it "contains" hAtom and can possibly be converted to Atom (atom:id
> concerns notwithstanding).
>
> The page at http://wordpress.org/ contains RDFa and can be converted to
> RDF/XML.
>
>> The question of comparison is deliberately chosen to illuminate what
>> are developers actually coding? What syntax? Not what can you "infer",
>> "parse as", or "convert to".
>
> In the case of http://wordpress.org/, they have coded RDFa. Thanks to
> the fact that RDFa grandfathered in some semantics from earlier
> versions of (X)HTML, they may not have been *knowingly* doing so.

Claiming some code is RDFa that clearly was not *knowingly*
written/intended as such points out the key flaw - if you're talking
about what are developers adopting, then their intent, and what they
are explicitly choosing to do is what matters. Thus comparisons like
Google's Rich Snippets adoption table make sense to contrast developer
adoption of different format approaches.


>> > The question "how many pages contain RDFa?" is only meaningful if
>> > certain qualifications are added... Does broken RDFa count?
>>
>> broken RDFa counts, but only to demonstrate the difficulty of coding
>> RDFa, not instances of RDF(a). one of the reasons that Google found
>> so little RDFa is may be because much of it was broken. this is one of
>> the common problems with namespaces in data.
>
> Do twitter's 100 million plus broken hCards demonstrate the difficulty
> of coding microformats?

If there are problems with Twitter's hCards, please document the
specific problems on the respective issues page that way we can better
verify the problem report(s), investigate possible causes, and suggest
fixes to Twitter as well.

I've added a placeholder section for this:

http://microformats.org/wiki/hcard-supporting-user-profiles-issues#Twitter


> I imagine that the reason Google found so little RDFa is because they
> were only counting RDFa that used their own RDFa vocabulary, and
> neglecting to count *all* RDFa. Without more information on their
> testing process I can't verify that though.

My understanding of RDF(a) advocates is that one of the design
principles of RDF(a) is its infinite extensibility and philosophy of
encouraging everyone to make up their own vocabulary (which is often
contrasted with microformats opposite design principle of deliberate
re-use of shared vocabularies for better interoperability and
communication).

Google using their own RDFa vocabulary is a direct consequence of this
principle/philosophy of RDF(a)/namespaces etc., and thus if there's a
problem with that approach, it merely calls into question that
principle/philosophy of RDF(a)/namespaces.


> This would be analogous to Wikipedia surveying usage levels of rel-tag
> by searching for rel-tag links to http://en.wikipedia.org/wiki/* only.

It's not analogous because rel-tag doesn't explicitly state nor
encourage sites to only use their own rel-tags, whereas RDF(a) does
encourage making up and using your own vocabularies.


>>> Do grandfathered rel/rev values count? &c.
>>
>> rel/rev syntax and values work without RDFa - they're not RDFa,
>> despite RDFa's attempt to subsume them (and even errantly claim/imply
>> credit in the spec, e.g. rel-license).
>
> I don't think the RDFa spec claims credit for anything in particular.
> It reuses a lot of (X)HTML attributes and rel/rev values, but is rather
> silent on their origins.

Right - it's that "silent on their origins" which is sloppy at best
and plagiaristic (implying first invention/credit by absence of
citation of prior art) at worst.

I'll follow-up with a more detailed description of where/when RDFa
claims/implies credit for work that predates RDFa. E.g. the
introduction of rel='license' in an example following a section that
states "examples to illustrate how Alice can use RDFa" [1] is one such
errant/deceptive implication that rel="license" is RDFa, that fails to
provide citations to the invention/introduction of rel="license" [2]
which IMHO borders on plagiarism, writing something implying
claiming/taking credit for something that was invented by another
beforehand, and omitting the reference to prior art.

[1] http://www.w3.org/TR/2008/NOTE-xhtml-rdfa-primer-20081014/#id84491

[2] http://microformats.org/wiki/history
2004-02-11 http://tantek.com/presentations/2004etech/realworldsemanticspres.html

The counter-argument is that perhaps it is/was a case of simultaneous
invention, which I would prefer to give more weight to, except that
the microformats introduction of rel-license was explicitly
discussed/mentioned afterwards on the Creative Commons mailing list[3]
where many related subsequent RDF discussions were had:

[3] http://lists.ibiblio.org/pipermail/cc-metadata/2004-February/000290.html


>> 1. theoretical strawman[1]
>> 2. google.com/robots.txt prevents this from counting in any "search"
>
> I think you're neglecting the serious point that page counts on the Web
> are not especially significant - it's easy to generate many millions of
> pages from a single template.

If it's a "serious point" - please provide data to substantiate that
criticism rather than merely asserting that Yahoo Search Monkey
returns numbers that "are not especially significant" - I think the
Yahoo Search Monkey developers deserve more benefit of the doubt.


> There are probably much more interesting measures than page counts. To
> evaluate the health of a format, it's just as important -- perhaps more
> important -- to look at how many active consumers there are.

By all means, propose alternative concrete "more interesting measures"
and how you would measure them.

Until then, the concrete Yahoo Search Monkey measures are the most
interesting measures of web-wide microformats adoption to date.


Sarven,

On Wed, Jul 7, 2010 at 3:53 PM, Sarven Capadisli <info@csarven.ca> wrote:
>
> I'm not sure about exact numbers, but a StatusNet instance (e.g.,
> http://identi.ca/ ), has hCards for all users and groups. It includes
> representative hCards.
>
> Updated wiki.

Thanks much Sarven!

Do you know *when* Identica added hCard support? (I'd really prefer to
keep this blog post to recognizing specific deployments in the past
year)

Also, do you know how many Identica/status.net profiles there are today?

Please feel free to add answers to those directly to Identica's entry
on the hCard supporting user profiles page:

http://microformats.org/wiki/hcard-supporting-user-profiles

Thanks,

Tantek

-- 
http://tantek.com/

From mail at tobyinkster.co.uk  Thu Jul  8 02:47:02 2010
From: mail at tobyinkster.co.uk (Toby Inkster)
Date: Thu Jul  8 02:47:57 2010
Subject: [uf-discuss] 2 billion hCards! gathering material for a 
	"microformats.org turns 5" blog post
In-Reply-To: <AANLkTilMDr8tai3GAzbrtB-jLBroxkyuFkitdMJneo7E@mail.gmail.com>
References: <AANLkTikVQ7Mp-X1MSj8lYmG5sYaQuct6maSB85ZrLuR2@mail.gmail.com>
	<AANLkTikiITiC6PipPOPAfMNaWQV1viIkV8YCNrxV1AwA@mail.gmail.com>
	<4C32E8D7.7080705@yahoo-inc.com>
	<AANLkTiky55g_xA0KsZjFiS35Bc01DcO-v4iQsoBNI_Ww@mail.gmail.com>
	<20100707124352.3f2a215f@miranda.g5n.co.uk>
	<AANLkTikGOIAGSveZGK33tFj2jgyiLpM52oKd1AwwJtK1@mail.gmail.com>
	<20100708002838.1b370e8b@miranda.g5n.co.uk>
	<AANLkTilMDr8tai3GAzbrtB-jLBroxkyuFkitdMJneo7E@mail.gmail.com>
Message-ID: <20100708104702.537d1fc4@miranda.g5n.co.uk>

On Thu, 8 Jul 2010 01:25:03 -0700
Tantek ?elik <tantek@cs.stanford.edu> wrote:

> If there are problems with Twitter's hCards, please document the
> specific problems on the respective issues page that way we can better
> verify the problem report(s), investigate possible causes, and suggest
> fixes to Twitter as well.

It's been documented on the Wiki since 2007.

http://microformats.org/wiki/implementations?diff=23858

> My understanding of RDF(a) advocates is that one of the design
> principles of RDF(a) is its infinite extensibility and philosophy of
> encouraging everyone to make up their own vocabulary (which is often
> contrasted with microformats opposite design principle of deliberate
> re-use of shared vocabularies for better interoperability and
> communication).

I wouldn't say that RDF encourages everyone to make up their own
vocabulary, but that it makes it feasible.

> Google using their own RDFa vocabulary is a direct consequence of this
> principle/philosophy of RDF(a)/namespaces etc., and thus if there's a
> problem with that approach, it merely calls into question that
> principle/philosophy of RDF(a)/namespaces.

There's no problem with Google making up their own RDF vocabulary.

The problem is counting the number of uses of their own vocabulary on
the Web, taking that number and claiming it as representative of RDFa
deployment as a whole.

> The counter-argument is that perhaps it is/was a case of simultaneous
> invention, which I would prefer to give more weight to, except that
> the microformats introduction of rel-license was explicitly
> discussed/mentioned afterwards on the Creative Commons mailing list[3]
> where many related subsequent RDF discussions were had:
> 
> http://lists.ibiblio.org/pipermail/cc-metadata/2004-February/000290.html

If you go back a further three months you'll see this thread:

http://lists.ibiblio.org/pipermail/cc-metadata/2003-December/000237.html

Cory Nelson wrote:

| I propose sites under a CC license include a meta tag in their header
| saying so.  Though this won't help people recognize the content as
| being under a CC license, it could help search engines greatly.
| 
| Here is an example:
| 
| <meta name="license"
| content="http://creativecommons.org/licenses/by-nd-nc/1.0/" />

And Lucas Gonze followed up with:

| It would also work to have a "link rel=" element

So the seed of the idea had been around since before the microformat
proposal. Certainly the microformat proposal solidified the idea, but
it's not inconceivable that when rel=license was proposed to be added
to XHTML2 (the metadata parts of which evolved into RDFa), Ben Adida was
drawing from earlier ideas, and possibly unaware of the microformat.

http://lists.w3.org/Archives/Public/www-html-editor/2005AprJun/0178.html

It's worth noting that before "license" was added to the XHTML2 link
relations vocabulary, the term "license" was already defined in both
Creative Commons' and Dublin Core's vocabularies, in the former case
since 2008. Ben's proposal seems not so much inspired by the
microformats use, but rather to move the term "license" out of Creative
Commons' namespace to help clarify that it may be used to point to
non-CC licenses too.

-- 
Toby A Inkster
<mailto:mail@tobyinkster.co.uk>
<http://tobyinkster.co.uk>


From microformats.org at boblet.net  Mon Jul 12 08:27:52 2010
From: microformats.org at boblet.net (Oli Studholme)
Date: Mon Jul 12 08:28:20 2010
Subject: [uf-discuss] re: HTML5 support
Message-ID: <AANLkTinOxJu5JIgIAsUGi81F9juAYWBzQPid8EhGQx7K@mail.gmail.com>

Hey all,

I?ve got a few questions about using microformats in HTML5:

Back on 14 October 2009, Tantek made the following additions to
http://microformats.org/wiki/html5

===
microdata vocabularies

microdata vCard - use hCard instead, taking into account the hCard FAQ
and resolved+closed issues. hCard 1.0.1 (under development) is
incorporating these errata. Avoid the "microdata vCard" vocabulary as
it is an out-of-date fork/snapshot of hCard.
microdata vEvent - use hCalendar instead, taking into account the
hCalendar FAQ and resolved+closed issues. hCalendar 1.0.1 is
incorporating these errata. Avoid the "microdata vEvent" vocabulary,
as it is an out-of-date fork/snapshot of hCalendar's vevent root class
name and applicable properties.
===

I?m assuming this was when Microdata vcard and vevent specs were
based on hCard and hCalendar. They?re now based on the original RFCs,
so I guess these warnings are no longer relevant, and have updated the
page. If they are still relevant (Tantek?) please let me know the
situation and I?ll update as required or roll back.

Ref:
* Microdata vcard:
http://www.whatwg.org/specs/web-apps/current-work/multipage/links.html#vcard
* Microdata vevent:
http://www.whatwg.org/specs/web-apps/current-work/multipage/links.html#vevent


I?m also wondering if someone can explain the ?encourage HTML5 to drop
their vocabulary and use ?F vocabulary instead? comments on the
brainstorming pages linked to from:
http://microformats.org/wiki/html5#Requests
Again if these are no longer accurate I?m happy to update them.


Under Current microformat compatibility
http://microformats.org/wiki/html5#Current_microformat_compatibility
only hCard and XFN are listed as compatible. I?m wondering if I should
also add these specifications too:
* XOXO
* rel-nofollow (defined in HTML5 spec)
* rel-license (defined in HTML5 spec)
* rel-tag (defined in HTML5 spec)
(the rel values are defined on
http://www.whatwg.org/specs/web-apps/current-work/multipage/links.html
)

There?s a bunch more draft specifications that look to be compatible,
and there?s also a way to add extra rel values to the HTML5 spec:
http://wiki.whatwg.org/wiki/RelExtensions


Finally, what was the upshoot of this email about the ?magic? in fn?
http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2010-January/024881.html

Thanks for your time

peace - oli
@boblet

From tantek at cs.stanford.edu  Mon Jul 12 09:31:00 2010
From: tantek at cs.stanford.edu (Tantek Celik)
Date: Mon Jul 12 10:02:42 2010
Subject: [uf-discuss] re: HTML5 support
In-Reply-To: <AANLkTinOxJu5JIgIAsUGi81F9juAYWBzQPid8EhGQx7K@mail.gmail.com>
References: <AANLkTinOxJu5JIgIAsUGi81F9juAYWBzQPid8EhGQx7K@mail.gmail.com>
Message-ID: <1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry>

Hi Oli,

In short, the warnings are still relevant.

Please don't update the pages just based on a "guess" that the warnings are no longer relevant, and revert accordingly. 

If you can verify the changes made in microdata are consistent with hCard etc (rather than a fork), and cite the specific changes, then it makes sense to make updates.

Regarding the rel-values - the latest correct definitions are still on the microformats wiki.

For example the HTML5 definition of rel-tag mistakenly always applies it to the whole page which is incorrect.

The microformats.org/wiki/rel-tag spec and implementations commonly apply it to parts of a page like blog posts (hAtom, Technorati, IceRocket), or contacts/events/items (hCard, hCalendar, hReview, hListing, hProduct).

In general, the latest, most accurate work on microformats (both class vocabularies and rel values), is on the microformats wiki, not the HTML5 spec, and thus you should refer to the microformats wiki spec pages as canonical.

Thanks,

Tantek


-----Original Message-----
From: Oli Studholme <microformats.org@boblet.net>
Sender: microformats-discuss-bounces@microformats.org
Date: Tue, 13 Jul 2010 00:27:52 
To: <microformats-discuss@microformats.org>
Reply-To: Microformats Discuss <microformats-discuss@microformats.org>
Subject: [uf-discuss] re: HTML5 support

Hey all,

I?ve got a few questions about using microformats in HTML5:

Back on 14 October 2009, Tantek made the following additions to
http://microformats.org/wiki/html5

===
microdata vocabularies

microdata vCard - use hCard instead, taking into account the hCard FAQ
and resolved+closed issues. hCard 1.0.1 (under development) is
incorporating these errata. Avoid the "microdata vCard" vocabulary as
it is an out-of-date fork/snapshot of hCard.
microdata vEvent - use hCalendar instead, taking into account the
hCalendar FAQ and resolved+closed issues. hCalendar 1.0.1 is
incorporating these errata. Avoid the "microdata vEvent" vocabulary,
as it is an out-of-date fork/snapshot of hCalendar's vevent root class
name and applicable properties.
===

I?m assuming this was when Microdata vcard and vevent specs were
based on hCard and hCalendar. They?re now based on the original RFCs,
so I guess these warnings are no longer relevant, and have updated the
page. If they are still relevant (Tantek?) please let me know the
situation and I?ll update as required or roll back.

Ref:
* Microdata vcard:
http://www.whatwg.org/specs/web-apps/current-work/multipage/links.html#vcard
* Microdata vevent:
http://www.whatwg.org/specs/web-apps/current-work/multipage/links.html#vevent


I?m also wondering if someone can explain the ?encourage HTML5 to drop
their vocabulary and use ?F vocabulary instead? comments on the
brainstorming pages linked to from:
http://microformats.org/wiki/html5#Requests
Again if these are no longer accurate I?m happy to update them.


Under Current microformat compatibility
http://microformats.org/wiki/html5#Current_microformat_compatibility
only hCard and XFN are listed as compatible. I?m wondering if I should
also add these specifications too:
* XOXO
* rel-nofollow (defined in HTML5 spec)
* rel-license (defined in HTML5 spec)
* rel-tag (defined in HTML5 spec)
(the rel values are defined on
http://www.whatwg.org/specs/web-apps/current-work/multipage/links.html
)

There?s a bunch more draft specifications that look to be compatible,
and there?s also a way to add extra rel values to the HTML5 spec:
http://wiki.whatwg.org/wiki/RelExtensions


Finally, what was the upshoot of this email about the ?magic? in fn?
http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2010-January/024881.html

Thanks for your time

peace - oli
@boblet

_______________________________________________
microformats-discuss mailing list
microformats-discuss@microformats.org
http://microformats.org/mailman/listinfo/microformats-discuss

From martin at weborganics.co.uk  Mon Jul 12 13:13:25 2010
From: martin at weborganics.co.uk (Martin McEvoy)
Date: Mon Jul 12 13:40:02 2010
Subject: [uf-discuss] re: HTML5 support
In-Reply-To: <1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry>
References: <AANLkTinOxJu5JIgIAsUGi81F9juAYWBzQPid8EhGQx7K@mail.gmail.com>
	<1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry>
Message-ID: <4C3B7765.1080603@weborganics.co.uk>

  On 12/07/2010 17:31, Tantek Celik wrote:
> If you can verify the changes made in microdata are consistent with hCard etc (rather than a fork), and cite the specific changes, then it makes sense to make updates.

It may be relevant to note that microdata is no longer part of the HTML5 
core [1] .
microdata does however exist as a separate specification [2] but is just 
"attributes" and as far as I know, microdata vCard and vEvent no longer 
exists as part of the microdata specification do they?.
I wouldnt really be surprised to see microdata disappear all 
together(but that's just my thought)

Best wishes

[1] http://www.w3.org/TR/html5/
[2] http://www.w3.org/TR/microdata/

-- 
Martin McEvoy

From philipj at opera.com  Tue Jul 13 04:24:42 2010
From: philipj at opera.com (=?utf-8?Q?Philip_J=C3=A4genstedt?=)
Date: Tue Jul 13 06:18:40 2010
Subject: [uf-discuss] re: HTML5 support
In-Reply-To: <AANLkTinOxJu5JIgIAsUGi81F9juAYWBzQPid8EhGQx7K@mail.gmail.com>
References: <AANLkTinOxJu5JIgIAsUGi81F9juAYWBzQPid8EhGQx7K@mail.gmail.com>
Message-ID: <op.vfr4ndw1sr6mfa@philip-pc.gothenburg.osa>

On Mon, 12 Jul 2010 17:27:52 +0200, Oli Studholme  
<microformats.org@boblet.net> wrote:

> Finally, what was the upshoot of this email about the ?magic? in fn?
> http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2010-January/024881.html

It was dropped in  
<http://html5.org/tools/web-apps-tracker?from=4980&to=4981> with the  
commit message "Remove the magic from the vCard vocabulary, since the  
magic doesn't really work." It should be removed from the upstream  
vocabulary too, but I have little hope of that happening.

-- 
Philip J?genstedt
Core Developer
Opera Software

From microformats.org at boblet.net  Tue Jul 13 09:59:42 2010
From: microformats.org at boblet.net (Oli Studholme)
Date: Tue Jul 13 10:18:21 2010
Subject: [uf-discuss] re: HTML5 support
In-Reply-To: <1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry>
References: <AANLkTinOxJu5JIgIAsUGi81F9juAYWBzQPid8EhGQx7K@mail.gmail.com> 
	<1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry>
Message-ID: <AANLkTikpstw7Uq_qimMTpGGWvn9YkvM89UIlB9B_c6uL@mail.gmail.com>

Hey all,

Thanks for your replies

On Tue, Jul 13, 2010 at 1:31 AM, Tantek Celik <tantek@cs.stanford.edu> wrote:

> Please don't update the pages just based on a "guess" that the warnings are no longer relevant, and revert accordingly.

I?d revert the warnings, but it appears you?ve moved the content to
the wiki/microdata page, so I?m assuming the current text is as
desired. I asked @hixie about the warning and was told that the vCard
vocabulary had been based on hCard (I guess this is the fork your
comment referred to), but was now based directly on vCard. I also
asked @phae and @adactio about the warning, and was encouraged to make
changes. I?m not able to find a corroborating svn log entry ? I?ll ask
@hixie for more info.

> In general, the latest, most accurate work on microformats (both class vocabularies and rel values), is on the microformats wiki, not the HTML5 spec, and thus you should refer to the microformats wiki spec pages as canonical.

I understand. I?d assumed the page was out of date due to the other
errors I fixed, and the lack of reply to my comment about timezone
validation from February. I?ll email the list in future. Also thank
you for the much clearer guidance on wiki/microdata.


On Tue, Jul 13, 2010 at 5:13 AM, Martin McEvoy <martin@weborganics.co.uk> wrote:

> microdata does however exist as a separate specification [2] but is just
> "attributes" and as far as I know, microdata vCard and vEvent no longer
> exists as part of the microdata specification do they?.

They?ve been removed due to ?politics?. They?re available via the
WHATWG spec as referenced in my email, and now in the wiki/microdata
page (thanks Tantek).

> I wouldnt really be surprised to see microdata disappear all together(but
> that's just my thought)

But how could microdata possibly disappear now that Google supports it? ;)


Finally thanks for the clarification Philip

peace - oli

From martin at weborganics.co.uk  Tue Jul 13 18:45:59 2010
From: martin at weborganics.co.uk (Martin McEvoy)
Date: Tue Jul 13 18:53:44 2010
Subject: [uf-discuss] re: HTML5 support
In-Reply-To: <AANLkTikpstw7Uq_qimMTpGGWvn9YkvM89UIlB9B_c6uL@mail.gmail.com>
References: <AANLkTinOxJu5JIgIAsUGi81F9juAYWBzQPid8EhGQx7K@mail.gmail.com>
	<1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry>
	<AANLkTikpstw7Uq_qimMTpGGWvn9YkvM89UIlB9B_c6uL@mail.gmail.com>
Message-ID: <4C3D16D7.5010408@weborganics.co.uk>

  Hello Oli ...

On 13/07/2010 17:59, Oli Studholme wrote:
> On Tue, Jul 13, 2010 at 5:13 AM, Martin McEvoy<martin@weborganics.co.uk>  wrote
>> I wouldnt really be surprised to see microdata disappear all together(but
>> that's just my thought)
> But how could microdata possibly disappear now that Google supports it? ;)

Because Microdata is far to obtrusive to be practical in the "real 
world" for example....

Microdata vcard example from 
http://www.whatwg.org/specs/web-apps/current-work/multipage/links.html#vcard

<span itemscope itemtype="http://microformats.org/profile/hcard">
	<span itemprop=fn>
		<span itemprop="n">
			<span itemprop="given-name">George</span>
			<span itemprop="family-name">Washington</span>
		</span>
	</span>
</span>

8 lines of code which would parse as:

BEGIN:VCARD
PROFILE:VCARD
VERSION:3.0
SOURCE:document's address
FN:George Washington
N:Washington;George;;;
END:VCARD

great you would think, now try that using microformats, example from 
http://yiid.cc/3GI2

<span class="vcard">
<span class="fn">George Washington</span>
</span>

3 lines of code which parses as:

BEGIN:VCARD
SOURCE:document's address
NAME:document's title
VERSION:3.0
N;CHARSET=UTF-8:Washington;George;;;
FN;CHARSET=UTF-8:George Washington
END:VCARD

from a commercial and practical point of view, microdata is definitely 
not intended to be for "humans first" .

Anyway believe what you like, microdata needs a *lot* of work before it 
can ever be considered as  "micro" as far as I can see, at the moment It 
just confuses people into using an unnecessary semantic.

Best wishes

-- 
Martin McEvoy

From microformats.org at boblet.net  Tue Jul 13 19:14:37 2010
From: microformats.org at boblet.net (Oli Studholme)
Date: Tue Jul 13 19:15:11 2010
Subject: [uf-discuss] re: HTML5 support
In-Reply-To: <AANLkTikpstw7Uq_qimMTpGGWvn9YkvM89UIlB9B_c6uL@mail.gmail.com>
References: <AANLkTinOxJu5JIgIAsUGi81F9juAYWBzQPid8EhGQx7K@mail.gmail.com> 
	<1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry>
	<AANLkTikpstw7Uq_qimMTpGGWvn9YkvM89UIlB9B_c6uL@mail.gmail.com>
Message-ID: <AANLkTinES5z3rhy-IEzpx1-w1BWVMgAQUopneU1H5Qxw@mail.gmail.com>

Hey Tantek,

>From IRC:

# [20:52] <boblet> hey Hixie, can you give me more details about the
microdata vcard vocab being based on vcard not ?an out-of-date fork of
hcard??
# [20:54] <Hixie> i just went down the vcard spec and mapped it
directly to microdata
# [20:54] <Hixie> i had originally made some minor changes to match
hcard in places, but i've since removed those
# [20:56] <boblet> Hixie: was that the fn magic? any other hcard ->
vcard reversions?
# [20:56] <Hixie> i think the only bit was the stuff with FN, yeah
# [20:57] <Hixie> everything else is just a straight mapping of the vcard spec
# [20:57] <Hixie> i did use the hcard names for the bits of vcard that
needed splitting into multiple fields, but just to make sure the
terminology was consistent, it's not "forked from hcard" or anything
?
# [20:58] <Hixie> the whole point of microdata is that people can use
whatever vocabularies they like; the vcard one is basically a proof of
concept to show that it is possible to design a vocabulary in very
little time and to show how to write a spec for one

http://www.whatwg.org/specs/web-apps/current-work/multipage/links.html#vcard
?The following are the type's defined property names. They are based
on the vocabulary defined in the vCard specification and its
extensions, where more information on how to interpret the values can
be found. [RFC2426] [RFC4770]?

Suggested edit on
http://microformats.org/wiki/microdata#microdata_vCard_vocabulary
Avoid the "microdata vCard vocabulary" as in many ways it is an
out-of-date fork/snapshot of hCard, even though portions of it appear
to based directly on the vCard RFC. as well.
?
Avoid the "microdata vCard vocabulary" as it is based directly on the vCard RFC.

(Plus the same for vEvent)


Regarding rel-* microformats:

# rel-nofollow

Microformats:
?By adding rel="nofollow" to a hyperlink, a page indicates that the
destination of that hyperlink should not be afforded any additional
weight or ranking by user agents which perform link analysis upon web
pages (e.g. search engines). Typical use cases include links created
by 3rd party commenters on blogs, or links the author wishes to point
to, but avoid endorsing.?

HTML5:
?The nofollow keyword indicates that the link is not endorsed by the
original author or publisher of the page, or that the link to the
referenced document was included primarily because of a commercial
relationship between people affiliated with the two pages.?

# rel-license

Microformats:
?By adding rel="license" to a hyperlink, a page indicates that the
destination of that hyperlink is a license for the current page.?

HTML5:
?The license keyword indicates that the referenced document provides
the copyright license terms under which the main content of the
current document is provided.?

Out of curiosity what are the perceived incompatibilities in these two
examples that prevent them from being listed under
http://microformats.org/wiki/html5#Current_microformat_compatibility ?

peace - oli

From martin at weborganics.co.uk  Tue Jul 13 19:44:07 2010
From: martin at weborganics.co.uk (Martin McEvoy)
Date: Tue Jul 13 19:50:31 2010
Subject: [uf-discuss] re: HTML5 support
In-Reply-To: <4C3D16D7.5010408@weborganics.co.uk>
References: <AANLkTinOxJu5JIgIAsUGi81F9juAYWBzQPid8EhGQx7K@mail.gmail.com>
	<1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry>
	<AANLkTikpstw7Uq_qimMTpGGWvn9YkvM89UIlB9B_c6uL@mail.gmail.com>
	<4C3D16D7.5010408@weborganics.co.uk>
Message-ID: <4C3D2477.5020106@weborganics.co.uk>

  Oli Please don't get me wrong microdata does offer some interesting 
potential as far as microformats are concerned, It just needs looking at 
with "new eyes" and in a way that can help microformats *and* be 100% 
compatible with the way microformats exist now.
There are a couple of attributes that could really be useful to 
microformats, the itemscope attribute because its opaque, and itemref 
which very similar to the include pattern but better because it would 
allow an author to reference whole blocks of data not just a single 
property.

example, you could have the following markup somewhere in a page:

<span id="contact" class="vcard" itemscope>
<strong class="fn">Alfred Hitchcock</strong>
</span>

and add different parts of a page say in the footer....

<address itemref="contact" class="adr" itemscope>
<span class="street-address">1600 Amphitheatre Parkway</span> <br>
<span class="street-address">Building 43, Second Floor</span> <br>
<span class="locality">Mountain View</span>,
<span class="region">CA</span>
<span class="postal-code">94043</span>
</address>

I don't see any problem in microformats adopting only the parts of 
microdata that are useful to microformats, there are probably others who 
will disagree with that though ;-)

Best wishes.

Martin


On 14/07/2010 02:45, Martin McEvoy wrote:
>  Hello Oli ...
>
> On 13/07/2010 17:59, Oli Studholme wrote:
>> On Tue, Jul 13, 2010 at 5:13 AM, Martin 
>> McEvoy<martin@weborganics.co.uk>  wrote
>>> I wouldnt really be surprised to see microdata disappear all 
>>> together(but
>>> that's just my thought)
>> But how could microdata possibly disappear now that Google supports 
>> it? ;)
>
> Because Microdata is far to obtrusive to be practical in the "real 
> world" for example....
>
> Microdata vcard example from 
> http://www.whatwg.org/specs/web-apps/current-work/multipage/links.html#vcard
>
> <span itemscope itemtype="http://microformats.org/profile/hcard">
> <span itemprop=fn>
> <span itemprop="n">
> <span itemprop="given-name">George</span>
> <span itemprop="family-name">Washington</span>
> </span>
> </span>
> </span>
>
> 8 lines of code which would parse as:
>
> BEGIN:VCARD
> PROFILE:VCARD
> VERSION:3.0
> SOURCE:document's address
> FN:George Washington
> N:Washington;George;;;
> END:VCARD
>
> great you would think, now try that using microformats, example from 
> http://yiid.cc/3GI2
>
> <span class="vcard">
> <span class="fn">George Washington</span>
> </span>
>
> 3 lines of code which parses as:
>
> BEGIN:VCARD
> SOURCE:document's address
> NAME:document's title
> VERSION:3.0
> N;CHARSET=UTF-8:Washington;George;;;
> FN;CHARSET=UTF-8:George Washington
> END:VCARD
>
> from a commercial and practical point of view, microdata is definitely 
> not intended to be for "humans first" .
>
> Anyway believe what you like, microdata needs a *lot* of work before 
> it can ever be considered as  "micro" as far as I can see, at the 
> moment It just confuses people into using an unnecessary semantic.
>
> Best wishes
>


-- 
Martin McEvoy

From microformats.org at boblet.net  Tue Jul 13 20:06:07 2010
From: microformats.org at boblet.net (Oli Studholme)
Date: Tue Jul 13 20:18:21 2010
Subject: [uf-discuss] re: HTML5 support
In-Reply-To: <4C3D16D7.5010408@weborganics.co.uk>
References: <AANLkTinOxJu5JIgIAsUGi81F9juAYWBzQPid8EhGQx7K@mail.gmail.com> 
	<1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry>
	<AANLkTikpstw7Uq_qimMTpGGWvn9YkvM89UIlB9B_c6uL@mail.gmail.com> 
	<4C3D16D7.5010408@weborganics.co.uk>
Message-ID: <AANLkTilbuRKvgs85czYge45TEN2xbhP6o-u4VjOOMDHF@mail.gmail.com>

Hey Martin,

On Wed, Jul 14, 2010 at 10:45 AM, Martin McEvoy
<martin@weborganics.co.uk> wrote:
> On 13/07/2010 17:59, Oli Studholme wrote:
>> But how could microdata possibly disappear now that Google supports it? ;)
>
> Because Microdata is far to obtrusive to be practical in the "real world"
> for example....
>
> Microdata vcard example from
> http://www.whatwg.org/specs/web-apps/current-work/multipage/links.html#vcard
>
> <span itemscope itemtype="http://microformats.org/profile/hcard">
> ? ? ? ?<span itemprop=fn>
> ? ? ? ? ? ? ? ?<span itemprop="n">
> ? ? ? ? ? ? ? ? ? ? ? ?<span itemprop="given-name">George</span>
> ? ? ? ? ? ? ? ? ? ? ? ?<span itemprop="family-name">Washington</span>
> ? ? ? ? ? ? ? ?</span>
> ? ? ? ?</span>
> </span>

This is equivalent to

<span itemscope itemtype="http://microformats.org/profile/hcard">
       <span itemprop="fn n">
              <span itemprop="given-name">George</span>
              <span itemprop="family-name">Washington</span>
       </span>
</span>

> from a commercial and practical point of view, microdata is definitely not
> intended to be for "humans first" .

I think it would be more accurate to say RFC2426 is not intended to be
?humans first? ;-) for better or worse vCard doesn?t contain implied
?n? optimisation.

> Anyway believe what you like, microdata needs a *lot* of work before it can
> ever be considered as ?"micro" as far as I can see, at the moment It just
> confuses people into using an unnecessary semantic.

Well, to use a non-English example:

<span class="vcard" lang="ja">
	<span class="fn n">
		<span class="family-name">???????</span>?
		<span class="given-name">??</span>
	</span>
</span>

<span itemscope itemtype="http://microformats.org/profile/hcard" lang="ja">
	<span itemprop="fn n">
		<span itemprop="family-name">???????</span>?
		<span itemprop="given-name">??</span>
	</span>
</span>

These seem pretty equivalent to me, with the main difference in length
being the itemtype URL. However there are advantages to using URLs for
specifying a vocabulary. Keep in mind the implied ?n? optimisation is
arguably potentially dangerous e.g. for a social app that only
collects the user?s name, rather than two separate fields for given
and family names, and then displays this as an hCard. While some
languages that have family-name given-name order don?t use a space
separator (CJK), a quick look at http://twitter.com/boblet shows one
incorrect optimisation for my friend Channy: ????(Channy Yun)?. As you
can imagine this doesn?t optimise well. I?d look for more but it seems
Twitter?s profile page vcards are completely borked :)

I agree that for marking up a person with their name and URL ? if you
can use implied ?n? optimisation ? microformats is superfast. However
I find I often use hCard for more data than just that, to the extent
that writing them without snippets becomes tiring. And if you?re
making snippets, there?s little difference.

HTH

peace - oli

PS just saw your reply (I can?t keep up! :) Yeah I?ve definitely
wanted an equivalent to itemref for microformats, and hadn?t come
across the include pattern before. thanks!

> in a way that can help microformats *and* be 100% compatible with the way microformats exist now

I don?t think compatibility is so important. Microformats, microdata
and RDFa all target the same basic problem space but each has it?s
strengths and weaknesses. different ideas help each technology improve
(RDFa 1.1 moving towards microformats? simplicity for example).
Finally (I perceive) ?F as an elegant hack to graft new semantics onto
HTML using the tools available; class, rel, rev, profile and coding
patterns. With the changed toolset in HTML5 (including no rev or
profile attributes) it makes sense to reassess methods, and I?m
looking at microdata and RDFa for that reason.

From scott at randomchaos.com  Tue Jul 13 21:09:36 2010
From: scott at randomchaos.com (Scott Reynen)
Date: Tue Jul 13 21:09:45 2010
Subject: [uf-discuss] re: HTML5 support
In-Reply-To: <AANLkTinES5z3rhy-IEzpx1-w1BWVMgAQUopneU1H5Qxw@mail.gmail.com>
References: <AANLkTinOxJu5JIgIAsUGi81F9juAYWBzQPid8EhGQx7K@mail.gmail.com>
	<1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry>
	<AANLkTikpstw7Uq_qimMTpGGWvn9YkvM89UIlB9B_c6uL@mail.gmail.com>
	<AANLkTinES5z3rhy-IEzpx1-w1BWVMgAQUopneU1H5Qxw@mail.gmail.com>
Message-ID: <D2281DAD-9139-4313-8438-199E88C215E6@randomchaos.com>

On Jul 13, 2010, at 8:14 PM, Oli Studholme wrote:

> Suggested edit on
> http://microformats.org/wiki/microdata#microdata_vCard_vocabulary
> Avoid the "microdata vCard vocabulary" as in many ways it is an
> out-of-date fork/snapshot of hCard, even though portions of it appear
> to based directly on the vCard RFC. as well.
> ?
> Avoid the "microdata vCard vocabulary" as it is based directly on the vCard RFC.

I'd suggest removing the entire vocabulary-specific section altogether.  As mentioned in the same page, microdata is aiming to solve a different problem than microformats, so it's misleading to suggest specific vocabularies are actually alternatives to specific microformats by talking about them vis-a-vis microformats.

Put another way, that section violates DRY.  Because microdata is aiming to solve a different problem, *no* microdata vocabulary could possibly be recommended in place of a specific microformat, so it's redundant to go into the ways in which a specific microdata vocabulary goes against microformat principles, principles it's not even attempting to follow.

Peace,
Scott


From angelo at gladding.name  Wed Jul 14 19:30:44 2010
From: angelo at gladding.name (Angelo Gladding)
Date: Wed Jul 14 19:30:49 2010
Subject: [uf-discuss] `microformats` and a universal test suite
Message-ID: <AANLkTikF-dooQcV0IiV1CLG0HDwQ3_tbEDnpLxfjDwNL@mail.gmail.com>

Hello all,

I am currently writing a universal parser [1]. It goes by the name
`microformats` because I intend it to be as close as possible to a
canonical codification of all things Microformats. This will be
accomplished by codifying each specification in a Python module using
what can best be described as a domain-specific language. See the
`adr` definition [2] and accompanying tests [3].

Each definition file will contain as much spec-related information as
possible. Each test suite will provide a series of HTML/ufJSON
equivalents. A web interface (currently functional, but unreliable as
I develop) acts as a web service for transformation and validation but
also to summarize the current state of Microformats down to author
tables and overall analysis of the lexicon.

The code currently in the repository is a reduction of the current
state of the project. I have defined 33 formats, ranging from proposal
to spec, in the definition format. Additionally I have a buggy
analysis module that renders DOT graphs of the entire lexicon [4] and
subsections thereof.

The scope is a bit wide but most is already written and output is
finally beginning to look robust -- which leads me to my main point:

- - -

Is anyone interested in helping with the compilation of a universal test suite?

I'd like to bring this up sooner than later as it is the one aspect of
my project that requires community participation for it to be truly
effective.

In particular, I'd like to grab the ear of Toby Inkster and Mike Kaply
and collaborate to standardize the results of Swignition, Operator,
and `microformats`.

The ultimate goal of the test suite is multi-part:
- to have a concrete set of tests that will allow future implementors
to be able to implement with confidence;
- to have a common format for specification authors to be able to
codify their designs;
- and to provide a plethora of examples for content creators including
*all* possible edge cases of all formats and patterns.

I have had little luck pursuing my ventures via the wiki due to its
rather ironic incapacity to implement microformats. The reasoning is
understandable, though, so I suggest that we just keep this simple and
rally around good old DVCS. I am aware of http://hg.microformats.org/
and am not opposed to forming a shared subrepo for the suite releasing
all tests under a CC0 in the process.

- - -

There are other aspects of the project that I'd like to involve the
community in as well, such as automated XMDP inferencing, semantic
graphing (graphing the semantic web as opposed to graphing the
Microformat lexicon), and consolidation of properties (which becomes
more apparent once you stare at a webpage presenting a spec's profile,
its graph, and property/subproperty derivation/relatives. These,
however, are considerably less important than testing and conformity
at the moment.

Looking forward to hearing from anyone interested.

[1]: https://bitbucket.org/angelo/microformats/
[2]: https://bitbucket.org/angelo/microformats/src/5f8dbe75b683/microformats/lexicon/adr.py
[3]: https://bitbucket.org/angelo/microformats/src/5f8dbe75b683/tests/adr/
[4]: http://imgur.com/5dpq7.jpg

--
Angelo Gladding
angelo@gladding.name
From scott at randomchaos.com  Wed Jul 14 20:37:44 2010
From: scott at randomchaos.com (Scott Reynen)
Date: Wed Jul 14 20:37:51 2010
Subject: [uf-discuss] `microformats` and a universal test suite
In-Reply-To: <AANLkTikF-dooQcV0IiV1CLG0HDwQ3_tbEDnpLxfjDwNL@mail.gmail.com>
References: <AANLkTikF-dooQcV0IiV1CLG0HDwQ3_tbEDnpLxfjDwNL@mail.gmail.com>
Message-ID: <1E3B98EC-FB68-48B8-8511-75F63831962A@randomchaos.com>

On Jul 14, 2010, at 8:30 PM, Angelo Gladding wrote:

> I am currently writing a universal parser [1].

Hi Angelo,

Sounds like an ambitious project and I'd like to have more to contribute, but all I have now is a suggestion to move this discussion to the microformats-dev list, which is focused on exactly this kind of topic:

http://microformats.org/mailman/listinfo/microformats-dev/

Peace,
Scott


From angelo at gladding.name  Wed Jul 14 21:08:37 2010
From: angelo at gladding.name (Angelo Gladding)
Date: Wed Jul 14 21:16:11 2010
Subject: [uf-discuss] `microformats` and a universal test suite
In-Reply-To: <1E3B98EC-FB68-48B8-8511-75F63831962A@randomchaos.com>
References: <AANLkTikF-dooQcV0IiV1CLG0HDwQ3_tbEDnpLxfjDwNL@mail.gmail.com>
	<1E3B98EC-FB68-48B8-8511-75F63831962A@randomchaos.com>
Message-ID: <AANLkTinNHS_ph2ujv3Pddr-lHHS6w_qDuMDMOe6l45YY@mail.gmail.com>

On Wed, Jul 14, 2010 at 8:37 PM, Scott Reynen <scott@randomchaos.com> wrote:
> On Jul 14, 2010, at 8:30 PM, Angelo Gladding wrote:
>
>> I am currently writing a universal parser [1].
>
> Hi Angelo,
>
> Sounds like an ambitious project and I'd like to have more to contribute, but all I have now is a suggestion to move this discussion to the microformats-dev list, which is focused on exactly this kind of topic:
>
> http://microformats.org/mailman/listinfo/microformats-dev/
>
> Peace,
> Scott
>
>
> _______________________________________________
> microformats-discuss mailing list
> microformats-discuss@microformats.org
> http://microformats.org/mailman/listinfo/microformats-discuss
>

I wasn't sure -- thought that might be more for development of
specifications. Will cross-post, thanks.

-- 
Angelo Gladding
angelo@gladding.name
From microformats.org at boblet.net  Sun Jul 18 05:38:31 2010
From: microformats.org at boblet.net (Oli Studholme)
Date: Sun Jul 18 05:39:11 2010
Subject: [uf-discuss] re: HTML5 support
In-Reply-To: <D2281DAD-9139-4313-8438-199E88C215E6@randomchaos.com>
References: <AANLkTinOxJu5JIgIAsUGi81F9juAYWBzQPid8EhGQx7K@mail.gmail.com> 
	<1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry>
	<AANLkTikpstw7Uq_qimMTpGGWvn9YkvM89UIlB9B_c6uL@mail.gmail.com> 
	<AANLkTinES5z3rhy-IEzpx1-w1BWVMgAQUopneU1H5Qxw@mail.gmail.com> 
	<D2281DAD-9139-4313-8438-199E88C215E6@randomchaos.com>
Message-ID: <AANLkTinD13jF-tpgfyuyd_V_h8zgTJrrvHAmuWte7MUT@mail.gmail.com>

Hey All,

re: Martin?s earlier email

On Wed, Jul 14, 2010 at 10:45 AM, Martin McEvoy
<martin@weborganics.co.uk> wrote:
> <span class="vcard">
> <span class="fn">George Washington</span>
> </span>

I think the issue you had with the microdata equivalent was
brevity/simplicity, correct? While the ?n? class optimisation isn?t in
the microdata vocabulary, and I?ve already covered how for non-Western
style names this doesn?t apply (and is potentially harmful), I forgot
about the profile attribute:
http://microformats.org/wiki/hcard#Profile

The difference is in microdata a profile (vocabulary) link is required
via @itemtype, whereas it?s a ?_should_? in microformats. If we add a
profile to my previous non-English example results in a draw for me in
the simplicity stakes:

<link rel="profile" href="http://microformats.org/profile/hcard">
?
<span class="vcard" lang="ja">
       <span class="fn n">
               <span class="family-name">???????</span>?
               <span class="given-name">??</span>
       </span>
</span>

<span itemscope itemtype="http://microformats.org/profile/hcard" lang="ja">
       <span itemprop="fn n">
               <span itemprop="family-name">???????</span>?
               <span itemprop="given-name">??</span>
       </span>
</span>

Of course if you can use implied ?n? optimisation microformats are
definitely simpler, but the difference is less pronounced when using
@profile:

<link rel="profile" href="http://microformats.org/profile/hcard">
?
<span class="vcard">
       <span class="fn">Oli Studholme</span>
</span>

<span itemscope itemtype="http://microformats.org/profile/hcard" lang="ja">
       <span itemprop="fn n"><span itemprop="given-name">Oli</span>
<span itemprop="family-name">Studholme</span></span>
</span>

Of course, no one actually uses @profile with microformats, so it?s
probably a moot point :D

Finally thank you for pointing out the nested fn and n itemprops in
the spec example which should be in the same itemprop. I filed a bug:
http://www.w3.org/Bugs/Public/show_bug.cgi?id=10159


On Wed, Jul 14, 2010 at 1:09 PM, Scott Reynen <scott@randomchaos.com> wrote:

> I'd suggest removing the entire vocabulary-specific section altogether.  As mentioned in the same page, microdata is aiming to solve a different problem than microformats, so it's misleading to suggest specific vocabularies are actually alternatives to specific microformats by talking about them vis-a-vis microformats.

I?m sorry, but what text are you referring to? What I see is:
?microdata is an extension to HTML5 that provides another way to embed
microformats and poshformats vocabularies?

> Put another way, that section violates DRY.  Because microdata is aiming to solve a different problem, *no* microdata vocabulary could possibly be recommended in place of a specific microformat, so it's redundant to go into the ways in which a specific microdata vocabulary goes against microformat principles, principles it's not even attempting to follow.

Out of curiosity what do you perceive are the different problems that
microformats and microdata are trying to solve?

I personally see microformats as a grass-roots movement that uses the
tools available to extend HTML with extra semantics. Currently this is
accomplished using @class, @rel etc. I see microdata as a new tool in
HTML5 that would also be suitable for using with microformats, so I?m
wondering what?s up with all the negativity directed toward microdata
in these replies.


@Tantek:

It seems the current inclusion of vcard and vevent vocabularies in the
HTML5 spec is something of a problem (at least based on the IMO
incorrect comments in the wiki I?ve pointed out above), so I wonder
how is progress going on the 1.0.1 versions that Hixie said he?d be
happy to link to as normative versions?
Ref: http://krijnhoetmer.nl/irc-logs/whatwg/20090717#l-335

According to Hixie the vcard/vevent vocabularies are in the spec as
examples of how to write a microdata vocabulary, so could presumably
be changed with something else (?the vcard one is basically a proof of
concept to show that it is possible to design a vocabulary in very
little time and to show how to write a spec for one?)
ref: http://krijnhoetmer.nl/irc-logs/whatwg/20100713#l-884

Finally, I wonder how I can assist in the documentation of how to use
any microformat via microdata?

ref: http://krijnhoetmer.nl/irc-logs/whatwg/20090717#l-437
# [10:36] <boblet> tantek: will current Microformats be released in
Microdata format at some stage?
# [10:37] <tantek> boblet - doubtful. but will likely happen is that
microformats.org will document how to use *any* microformat
generically using microdata syntax. watch this page for updates:
http://microformats.org/wiki/html5

peace - oli
@boblet

From scott at randomchaos.com  Sun Jul 18 09:10:37 2010
From: scott at randomchaos.com (Scott Reynen)
Date: Sun Jul 18 09:10:43 2010
Subject: [uf-discuss] re: HTML5 support
In-Reply-To: <AANLkTinD13jF-tpgfyuyd_V_h8zgTJrrvHAmuWte7MUT@mail.gmail.com>
References: <AANLkTinOxJu5JIgIAsUGi81F9juAYWBzQPid8EhGQx7K@mail.gmail.com>
	<1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry>
	<AANLkTikpstw7Uq_qimMTpGGWvn9YkvM89UIlB9B_c6uL@mail.gmail.com>
	<AANLkTinES5z3rhy-IEzpx1-w1BWVMgAQUopneU1H5Qxw@mail.gmail.com>
	<D2281DAD-9139-4313-8438-199E88C215E6@randomchaos.com>
	<AANLkTinD13jF-tpgfyuyd_V_h8zgTJrrvHAmuWte7MUT@mail.gmail.com>
Message-ID: <3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com>

On Jul 18, 2010, at 6:38 AM, Oli Studholme wrote:

>> I'd suggest removing the entire vocabulary-specific section altogether.  As mentioned in the same page, microdata is aiming to solve a different problem than microformats, so it's misleading to suggest specific vocabularies are actually alternatives to specific microformats by talking about them vis-a-vis microformats.
> 
> I?m sorry, but what text are you referring to?

This is what I'm referring to as the "vocabulary-specific section":

http://microformats.org/wiki/microdata#microdata_vocabularies

This is what I'm referring to as "mentioned in the same page, microdata is aiming to solve a different problem":

http://microformats.org/wiki/microdata#potential

> Out of curiosity what do you perceive are the different problems that
> microformats and microdata are trying to solve?

Microformats aim to "solve a specific problem."  Microdata aims to be compatible with RDF, which demands more generic semantics.  Because of this, I doubt you'll ever see something like n optimization in microdata.  You've suggested that's a good thing because n optimization doesn't make sense in all cases, but that's the crux of it: microformats aren't trying to make sense in all cases, while microdata is.  n optimization isn't a good thing or a bad thing; it's simply a reflection of different goals.

> I personally see microformats as a grass-roots movement that uses the
> tools available to extend HTML with extra semantics. Currently this is
> accomplished using @class, @rel etc. I see microdata as a new tool in
> HTML5 that would also be suitable for using with microformats, so I?m
> wondering what?s up with all the negativity directed toward microdata
> in these replies.

Maybe you could clarify what specifically you see as negativity toward microdata?  I don't see microdata and microformats having different goals as a bad thing for either.  Different goals are good.

Peace,
Scott


From microformats.org at boblet.net  Sun Jul 18 21:30:45 2010
From: microformats.org at boblet.net (Oli Studholme)
Date: Sun Jul 18 21:31:17 2010
Subject: [uf-discuss] re: HTML5 support
In-Reply-To: <AANLkTilbuRKvgs85czYge45TEN2xbhP6o-u4VjOOMDHF@mail.gmail.com>
References: <AANLkTinOxJu5JIgIAsUGi81F9juAYWBzQPid8EhGQx7K@mail.gmail.com> 
	<1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry>
	<AANLkTikpstw7Uq_qimMTpGGWvn9YkvM89UIlB9B_c6uL@mail.gmail.com> 
	<4C3D16D7.5010408@weborganics.co.uk>
	<AANLkTilbuRKvgs85czYge45TEN2xbhP6o-u4VjOOMDHF@mail.gmail.com>
Message-ID: <AANLkTil6Ll9ImyyHe2QMBoEzZ59DYecT9iucGsL_Juk6@mail.gmail.com>

Hey Martin,

On Wed, Jul 14, 2010 at 12:06 PM, Oli Studholme
<microformats.org@boblet.net> wrote:
>> Microdata vcard example from
>> http://www.whatwg.org/specs/web-apps/current-work/multipage/links.html#vcard
>>
>> <span itemscope itemtype="http://microformats.org/profile/hcard">
>> ? ? ? ?<span itemprop=fn>
>> ? ? ? ? ? ? ? ?<span itemprop="n">
>> ? ? ? ? ? ? ? ? ? ? ? ?<span itemprop="given-name">George</span>
>> ? ? ? ? ? ? ? ? ? ? ? ?<span itemprop="family-name">Washington</span>
>> ? ? ? ? ? ? ? ?</span>
>> ? ? ? ?</span>
>> </span>
>
> This is equivalent to
>
> <span itemscope itemtype="http://microformats.org/profile/hcard">
> ? ? ? <span itemprop="fn n">
> ? ? ? ? ? ? ?<span itemprop="given-name">George</span>
> ? ? ? ? ? ? ?<span itemprop="family-name">Washington</span>
> ? ? ? </span>
> </span>

I?m sorry but I misunderstood/misread the microdata vcard spec (I
didn?t realise that n was a nested item), and my example is wrong. It
should be longer not shorter :)

<span itemscope itemtype="http://microformats.org/profile/hcard">
? ? ? ?<span itemprop=fn>
? ? ? ? ? ? ? ?<span itemprop="n" itemscope>
? ? ? ? ? ? ? ? ? ? ? ?<span itemprop="given-name">George</span>
? ? ? ? ? ? ? ? ? ? ? ?<span itemprop="family-name">Washington</span>
? ? ? ? ? ? ? ?</span>
? ? ? ?</span>
</span>

So for a non-Western name one extra wrapper element, but for a name
with n optimisation three extra wrapper elements.

peace - oli

From microformats.org at boblet.net  Sun Jul 18 22:03:40 2010
From: microformats.org at boblet.net (Oli Studholme)
Date: Sun Jul 18 22:04:08 2010
Subject: [uf-discuss] re: HTML5 support
In-Reply-To: <3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com>
References: <AANLkTinOxJu5JIgIAsUGi81F9juAYWBzQPid8EhGQx7K@mail.gmail.com> 
	<1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry>
	<AANLkTikpstw7Uq_qimMTpGGWvn9YkvM89UIlB9B_c6uL@mail.gmail.com> 
	<AANLkTinES5z3rhy-IEzpx1-w1BWVMgAQUopneU1H5Qxw@mail.gmail.com> 
	<D2281DAD-9139-4313-8438-199E88C215E6@randomchaos.com>
	<AANLkTinD13jF-tpgfyuyd_V_h8zgTJrrvHAmuWte7MUT@mail.gmail.com> 
	<3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com>
Message-ID: <AANLkTilrqvHnKpBT-DVZrZ0Ic5y4hf00MnrrSh6Om-Bi@mail.gmail.com>

Hey Scott,

thanks for your reply.

On Mon, Jul 19, 2010 at 1:10 AM, Scott Reynen <scott@randomchaos.com> wrote:

> Microformats aim to "solve a specific problem." ?Microdata aims to be compatible with RDF, which demands more generic semantics. ?Because of this, I doubt you'll ever see something like n optimization in microdata. ?You've suggested that's a good thing because n optimization doesn't make sense in all cases, but that's the crux of it: microformats aren't trying to make sense in all cases, while microdata is. ?n optimization isn't a good thing or a bad thing; it's simply a reflection of different goals.

I disagree. The purpose of microdata is to ?annotate content with
specific machine-readable labels, e.g. to allow generic scripts to
provide services that are customised to the page?. This is also a
pretty good description of how @class is used in microformats, and I
think that?s a good metaphor. I think you should be comparing
microformats with microdata *vocabularies*, which also aim to solve a
specific problem. Microdata is just a method by which to do this.
While it?s possible to convert microdata into RDFa (along with JSON
and Atom), compatibility with RDF is not the aim of microdata ? if
anything it seems to be ?provide a simple mechanism to semantically
extend HTML5 to keep ppl who think this is important happy? :)

The n optimisation was actually in the microdata vcard spec, but Hixie
removed it after deciding it was ?magic?. While I can understand the
reasons, I think it?d be less confusing/easier if the vcard vocabulary
either removed all reference to hcard (e.g. used a
non-microformats.org itemtype URL), or mapped hCard exactly. I?m
hoping that once hCard 1.0.1 is finished one or both of these things
might happen.

As for using microdata, if you?re using simple microformats (just
fn+url hcards for example) maybe it is too wordy a method. But
personally I generally can?t use that optimisation (for example:
http://www.cie.mie-u.ac.jp/en/tri-u/2006/committee.html ), so I?m
interested in microdata vocabularies for microformats, or the generic
way of representing microformats in microdata that Tantek mentioned a
year ago.

> Maybe you could clarify what specifically you see as negativity toward microdata?

maybe I?m just reading too much into it after talking about
microformats and microdata with RDF ppl :D

peace - oli

From philipj at opera.com  Mon Jul 19 01:31:32 2010
From: philipj at opera.com (=?utf-8?Q?Philip_J=C3=A4genstedt?=)
Date: Mon Jul 19 01:31:44 2010
Subject: [uf-discuss] re: HTML5 support
In-Reply-To: <3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com>
References: <AANLkTinOxJu5JIgIAsUGi81F9juAYWBzQPid8EhGQx7K@mail.gmail.com>
	<1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry>
	<AANLkTikpstw7Uq_qimMTpGGWvn9YkvM89UIlB9B_c6uL@mail.gmail.com>
	<AANLkTinES5z3rhy-IEzpx1-w1BWVMgAQUopneU1H5Qxw@mail.gmail.com>
	<D2281DAD-9139-4313-8438-199E88C215E6@randomchaos.com>
	<AANLkTinD13jF-tpgfyuyd_V_h8zgTJrrvHAmuWte7MUT@mail.gmail.com>
	<3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com>
Message-ID: <op.vf20mun3sr6mfa@philip-pc>

On Sun, 18 Jul 2010 18:10:37 +0200, Scott Reynen <scott@randomchaos.com>  
wrote:

> On Jul 18, 2010, at 6:38 AM, Oli Studholme wrote:
>
>> Out of curiosity what do you perceive are the different problems that
>> microformats and microdata are trying to solve?
>
> Microformats aim to "solve a specific problem."  Microdata aims to be  
> compatible with RDF, which demands more generic semantics.

Microdata doesn't go out of its way to be compatible with existing RDF  
vocabularies, in fact I'd argue that the RDF extraction algorithm creates  
some pretty ugly URIs that anyone who actually likes RDF would frown upon  
and not want to use. In any event there's very little "RDFness" over the  
syntax itself, the model is key-values, not triples.

> Because of this, I doubt you'll ever see something like n optimization  
> in microdata.  You've suggested that's a good thing because n  
> optimization doesn't make sense in all cases, but that's the crux of it:  
> microformats aren't trying to make sense in all cases, while microdata  
> is.  n optimization isn't a good thing or a bad thing; it's simply a  
> reflection of different goals.

This isn't a difference between microformats and microdata. The microdata  
vocabulary *had* the 'n' optimization, but it was removed after I showed  
that it didn't work for e.g. Chinese or Vietnamese. I tried to learn from  
this community why it isn't a bad idea, but there wasn't much useful  
feedback. It really should be removed from microformats too, but that's  
probably too late.

-- 
Philip J?genstedt
Core Developer
Opera Software

From scott at randomchaos.com  Mon Jul 19 17:34:05 2010
From: scott at randomchaos.com (Scott Reynen)
Date: Mon Jul 19 17:34:17 2010
Subject: [uf-discuss] re: HTML5 support
In-Reply-To: <op.vf20mun3sr6mfa@philip-pc>
References: <AANLkTinOxJu5JIgIAsUGi81F9juAYWBzQPid8EhGQx7K@mail.gmail.com>
	<1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry>
	<AANLkTikpstw7Uq_qimMTpGGWvn9YkvM89UIlB9B_c6uL@mail.gmail.com>
	<AANLkTinES5z3rhy-IEzpx1-w1BWVMgAQUopneU1H5Qxw@mail.gmail.com>
	<D2281DAD-9139-4313-8438-199E88C215E6@randomchaos.com>
	<AANLkTinD13jF-tpgfyuyd_V_h8zgTJrrvHAmuWte7MUT@mail.gmail.com>
	<3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com>
	<op.vf20mun3sr6mfa@philip-pc>
Message-ID: <5DC1788B-59A0-45AE-8E88-4CF257DA642C@randomchaos.com>

On Jul 19, 2010, at 2:31 AM, Philip J?genstedt wrote:

>>> Out of curiosity what do you perceive are the different problems that
>>> microformats and microdata are trying to solve?
>> 
>> Microformats aim to "solve a specific problem."  Microdata aims to be compatible with RDF, which demands more generic semantics.
> 
> Microdata doesn't go out of its way to be compatible with existing RDF vocabularies

Maybe not specific vocabularies (that's kind of my point), but RDF itself is clearly a major consideration.  There's a whole section on it:

http://www.w3.org/TR/microdata/#rdf

> In any event there's very little "RDFness" over the syntax itself, the model is key-values, not triples.

It may not translate *well* to RDF, but I disagree that such translation isn't a goal.  The syntax isn't particularly important, though.  RDF is simply my sloppy shorthand for general purpose semantics.  Microformats, unlike both RDF and microdata, are explicitly not intended to be general purpose.  The microdata spec itself doesn't even mention specific vocabularies, whereas microformats are nothing *but* specific vocabularies.  It's no surprise that general purpose formats like microdata don't express specific vocabularies as succinctly as microformats.  It's also no surprise that microformats don't cover as much variety of data as general purpose formats.

>> Because of this, I doubt you'll ever see something like n optimization in microdata.
> 
> This isn't a difference between microformats and microdata. The microdata vocabulary *had* the 'n' optimization, but it was removed after I showed that it didn't work for e.g. Chinese or Vietnamese.

Well, so much for that prediction.  Still, the removal suggests to me that it *is* a significant difference:

> I tried to learn from this community why it isn't a bad idea, but there wasn't much useful feedback.

I'd argue it is a bad idea in microdata, but not in microformats, because of the very distinction I'm trying to draw between the two.

n optimization isn't required.  It's a handy shorthand in some specific cases, but shouldn't be used universally, as it does't make sense everywhere.  hCard can handle Chinese names just fine with explicit given-name and family-name properties.  Nothing about n optimization makes this more difficult; n optimization only makes specific cases easier.  Making specific cases easier is the whole point of microformats, but it's not at all the point of microdata.

Peace,
Scott


From microformats.org at boblet.net  Mon Jul 19 19:57:34 2010
From: microformats.org at boblet.net (Oli Studholme)
Date: Mon Jul 19 20:03:26 2010
Subject: [uf-discuss] re: HTML5 support
In-Reply-To: <5DC1788B-59A0-45AE-8E88-4CF257DA642C@randomchaos.com>
References: <AANLkTinOxJu5JIgIAsUGi81F9juAYWBzQPid8EhGQx7K@mail.gmail.com> 
	<1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry>
	<AANLkTikpstw7Uq_qimMTpGGWvn9YkvM89UIlB9B_c6uL@mail.gmail.com> 
	<AANLkTinES5z3rhy-IEzpx1-w1BWVMgAQUopneU1H5Qxw@mail.gmail.com> 
	<D2281DAD-9139-4313-8438-199E88C215E6@randomchaos.com>
	<AANLkTinD13jF-tpgfyuyd_V_h8zgTJrrvHAmuWte7MUT@mail.gmail.com> 
	<3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com>
	<op.vf20mun3sr6mfa@philip-pc> 
	<5DC1788B-59A0-45AE-8E88-4CF257DA642C@randomchaos.com>
Message-ID: <AANLkTinI_nAP7x0suraed0uC_YsIFKqnAQVxBB4ikx_I@mail.gmail.com>

Hey Scott,

On Tue, Jul 20, 2010 at 9:34 AM, Scott Reynen <scott@randomchaos.com> wrote:

>> Microdata doesn't go out of its way to be compatible with existing RDF vocabularies
>
> Maybe not specific vocabularies (that's kind of my point), but RDF itself is clearly a major consideration. ?There's a whole section on it:
>
> http://www.w3.org/TR/microdata/#rdf

No. There?s a sub-sub-section on converting to RDF, just as there are
for converting to JSON and Atom. That?s not a design goal, it?s
specified interoperability. There are also sub-sub-sections on vcard,
vevent and licensing vocabularies, so by the same logic these are also
major considerations (again no, they?re sample vocabularies).

> It's no surprise that general purpose formats like microdata don't express specific vocabularies as succinctly as microformats.

You?re not doing a lot of hCalendar formats I take it? ;-)

> I'd argue it is a bad idea in microdata, but not in microformats, because of the very distinction I'm trying to draw between the two.

As far as microdata goes it?s irrelevant ? that?s something decided by
the *vocabulary* author. Adding it isn?t a bad idea if the vocabulary
author thinks the shortcut has more good than bad points.

> Making specific cases easier is the whole point of microformats, but it's not at all the point of microdata.

?Making specific cases easier is the whole point of the class
attribute, but it's not at all the point of microdata?

Microdata ? and semantic class names plus posh coding patterns for
current microformats ? are the method; a means to an end. Microdata
vocabularies use microdata to express semantics, just as microformats
use the class attribute etc to express semantics. Microformats are a
little more concise in general (cough, datetimes ;-) compared to the
same vocabulary in microdata (@class is shorter than @itemprop by 4
characters, @property is optional whereas @itemtype is required etc),
but the differences are not so great, and any class-based microformat
can be written using microdata.

peace - oli

PS @Philip the reasons for n optimisation are as in the wiki; a
combination of putting authors first (shortcut for western-style
?given-name family-name? names), and accommodating mistakes in the
original RFC. I guess there was the expectation that hCard would
mainly be used with western-style names, a lack of knowledge of
Vietnamese, Chinese and other names that would be incorrectly
classified by this optimisation, and/or this shortcut was valued above
i18n issues (it was made back in 2005 after all).

I?d originally thought of it as just an edge case in Japanese, but
reading about Vietnamese, Chinese and Korean names I?m starting to
feel this is a serious i18n issue. I wonder what Tantek?s view, and
the view of whoever else is working on hCard 1.0.1, is. I wonder if it
will be perceived to be as serious as the a11y issues the abbr time
pattern had?

Aah just found
http://microformats.org/wiki/hcard-issues-resolved#fn-opt-i18n
and it seems not. I guess there?s the assumption that east asian pages
specify their language, which seems somewhat disconnected from reality
:/

From martin at weborganics.co.uk  Mon Jul 19 22:41:59 2010
From: martin at weborganics.co.uk (Martin McEvoy)
Date: Mon Jul 19 22:42:11 2010
Subject: [uf-discuss] re: HTML5 support
In-Reply-To: <AANLkTinI_nAP7x0suraed0uC_YsIFKqnAQVxBB4ikx_I@mail.gmail.com>
References: <AANLkTinOxJu5JIgIAsUGi81F9juAYWBzQPid8EhGQx7K@mail.gmail.com>
	<1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry>	<AANLkTikpstw7Uq_qimMTpGGWvn9YkvM89UIlB9B_c6uL@mail.gmail.com>
	<AANLkTinES5z3rhy-IEzpx1-w1BWVMgAQUopneU1H5Qxw@mail.gmail.com>
	<D2281DAD-9139-4313-8438-199E88C215E6@randomchaos.com>	<AANLkTinD13jF-tpgfyuyd_V_h8zgTJrrvHAmuWte7MUT@mail.gmail.com>
	<3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com>	<op.vf20mun3sr6mfa@philip-pc>
	<5DC1788B-59A0-45AE-8E88-4CF257DA642C@randomchaos.com>
	<AANLkTinI_nAP7x0suraed0uC_YsIFKqnAQVxBB4ikx_I@mail.gmail.com>
Message-ID: <4C453727.1060704@weborganics.co.uk>

  On 20/07/2010 03:57, Oli Studholme wrote:
> Hey Scott,
>
> On Tue, Jul 20, 2010 at 9:34 AM, Scott Reynen<scott@randomchaos.com>  wrote:
>
>> Making specific cases easier is the whole point of microformats, but it's not at all the point of microdata.
> ?Making specific cases easier is the whole point of the class
> attribute, but it's not at all the point of microdata?
>
> Microdata ? and semantic class names plus posh coding patterns for
> current microformats ? are the method; a means to an end. Microdata
> vocabularies use microdata to express semantics, just as microformats
> use the class attribute etc to express semantics. Microformats are a
> little more concise in general (cough, datetimes ;-) compared to the
> same vocabulary in microdata (@class is shorter than @itemprop by 4
> characters, @property is optional whereas @itemtype is required etc),
> but the differences are not so great, and any class-based microformat
> can be written using microdata.

Im sorry but you cannot express *microformats* in microdata if you do, 
its cute, but It isn't a microformat because microformats *only* use  
class names, and a few choice rel-values.  If you move a microformat 
away from @class its no longer a microformat and shouldn't be described 
as such (we are a bit fussy about that :P).

This is why when someone starts talking about a "new microformats" or 
"microformats done better" the first thing I ask myself is "does it use 
semantic class names?" ... no well its not a new microformat or 
microformats done better.

Well the *good* news is HTML5 already supports microformats without 
adding any attributes at all (Yay!) .... that is until someone marks 
@class as obsolete!! ... joke.

Best wishes.

-- 
Martin McEvoy

From angelo at gladding.name  Mon Jul 19 21:05:06 2010
From: angelo at gladding.name (Angelo Gladding)
Date: Mon Jul 19 22:51:54 2010
Subject: [uf-discuss] re: HTML5 support
In-Reply-To: <AANLkTinI_nAP7x0suraed0uC_YsIFKqnAQVxBB4ikx_I@mail.gmail.com>
References: <AANLkTinOxJu5JIgIAsUGi81F9juAYWBzQPid8EhGQx7K@mail.gmail.com>
	<1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry>
	<AANLkTikpstw7Uq_qimMTpGGWvn9YkvM89UIlB9B_c6uL@mail.gmail.com>
	<AANLkTinES5z3rhy-IEzpx1-w1BWVMgAQUopneU1H5Qxw@mail.gmail.com>
	<D2281DAD-9139-4313-8438-199E88C215E6@randomchaos.com>
	<AANLkTinD13jF-tpgfyuyd_V_h8zgTJrrvHAmuWte7MUT@mail.gmail.com>
	<3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com>
	<op.vf20mun3sr6mfa@philip-pc>
	<5DC1788B-59A0-45AE-8E88-4CF257DA642C@randomchaos.com>
	<AANLkTinI_nAP7x0suraed0uC_YsIFKqnAQVxBB4ikx_I@mail.gmail.com>
Message-ID: <AANLkTik1zqRB5suyS91V9ik_rvaEFi0pKiBSQHpFkw-i@mail.gmail.com>

Could it be said that microdata intends to do to Microformat syntax
what HTML5 did to HTML4 syntax rules in the sense that parsing is
unambiguous and easier to validate normativity?

Can an enlightened soul describe in which ways microdata is actually
superior to profiled poshformats?

- - -

Might a "humans first, machines second" CJKV internationalization of
`n` optimization be to analyze the contents of the `fn`'s @lang and
inner text and use either or both to better determine name order?

e.g.

<span class=hcard>Angelo Gladding</span>

{
  "hCard": [
    {
      "hcard": {
        "fn": "Angelo Gladding",
        "n": {
          "n": {
            "family-name": [
              "Gladding"
            ],
            "given-name": [
              "Angelo"
            ]
          }
        }
      }
    }
  ]
}

where
????? == anjero (Angelo)
??????? == guraddingu (Gladding)

<span class=hcard lang=ja>?????????????</span>

<html lang=ja>
<span class=hcard>?????????????</span>
</html>

<span class=hcard>?????????????</span>

{
  "hCard": [
    {
      "hcard": {
        "fn": "\u30b0\u30e9\u30c3\u30c7\u30a3\u30f3\u30b0\u3000\u30a2\u30f3\u30b8\u30a7\u30ed",
        "n": {
          "n": {
            "family-name": [
              "\u30b0\u30e9\u30c3\u30c7\u30a3\u30f3\u30b0"
            ],
            "given-name": [
              "\u30a2\u30f3\u30b8\u30a7\u30ed"
            ]
          }
        }
      }
    }
  ]
}

i.e.

Splitting on \u3000 (CJKV space), perform `n` optimization in reverse
when the `fn` element/ancestor matches @lang(zh|ja|ko|vi) or the first
character of the text content lies in one of the following Unicode
character ranges:

U+4E00?U+9FBF (Kanji)
U+3040?U+309F (Hiragana)
U+30A0?U+30FF (Katakana)
http://en.wikipedia.org/wiki/Japanese_writing_system

... Chinese ... Korean ... Vietnamese ... *i18n expert needed*

While this requires what I believe to be an uncommon usage of a space
delimeter among CJK names it could be an easy hack for a user of Site
X, assuming Site X does not explicitly define `n` properties, to
implement upon failed validation without necessitating code
modification on Site X's end.

-- 
Angelo Gladding
angelo@gladding.name

From philipj at opera.com  Tue Jul 20 03:25:03 2010
From: philipj at opera.com (=?utf-8?Q?Philip_J=C3=A4genstedt?=)
Date: Tue Jul 20 03:38:36 2010
Subject: [uf-discuss] re: HTML5 support
In-Reply-To: <AANLkTik1zqRB5suyS91V9ik_rvaEFi0pKiBSQHpFkw-i@mail.gmail.com>
References: <AANLkTinOxJu5JIgIAsUGi81F9juAYWBzQPid8EhGQx7K@mail.gmail.com>
	<1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry>
	<AANLkTikpstw7Uq_qimMTpGGWvn9YkvM89UIlB9B_c6uL@mail.gmail.com>
	<AANLkTinES5z3rhy-IEzpx1-w1BWVMgAQUopneU1H5Qxw@mail.gmail.com>
	<D2281DAD-9139-4313-8438-199E88C215E6@randomchaos.com>
	<AANLkTinD13jF-tpgfyuyd_V_h8zgTJrrvHAmuWte7MUT@mail.gmail.com>
	<3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com>
	<op.vf20mun3sr6mfa@philip-pc>
	<5DC1788B-59A0-45AE-8E88-4CF257DA642C@randomchaos.com>
	<AANLkTinI_nAP7x0suraed0uC_YsIFKqnAQVxBB4ikx_I@mail.gmail.com>
	<AANLkTik1zqRB5suyS91V9ik_rvaEFi0pKiBSQHpFkw-i@mail.gmail.com>
Message-ID: <op.vf40j1b6sr6mfa@philip-pc>

On Tue, 20 Jul 2010 06:05:06 +0200, Angelo Gladding <angelo@gladding.name>  
wrote:

> Could it be said that microdata intends to do to Microformat syntax
> what HTML5 did to HTML4 syntax rules in the sense that parsing is
> unambiguous and easier to validate normativity?

Yes, more or less. Of course vocabulary-specific rules can only be checked  
by a specialized validator, but checking the actual structure (key-value  
pairs) is something you get "for free". Also, I expect automatic  
validation of date-formats would be appreciated.

> Can an enlightened soul describe in which ways microdata is actually
> superior to profiled poshformats?

Microdata should be compared to the class attributes and the various  
patterns that microformats use, not any specific vocabulary. The main  
benefit is that parsing becomes well-defined and simple. That's why it's  
possible to define a JavaScript API for accessing microdata items on a  
page, which makes the data useful to the page itself, not only external  
scrapers. It also makes it feasible to make browser features like "add to  
address book" or "add to calendar", which really isn't really practical  
with microformats when the data is hidden in class attributes together  
with everything else.

> Might a "humans first, machines second" CJKV internationalization of
> `n` optimization be to analyze the contents of the `fn`'s @lang and
> inner text and use either or both to better determine name order?

The main problem with this is that due to lazy copy-pasting, lang="en" is  
often used even when the language isn't English. Also, in the case of e.g.  
Facebook, lang="en" would be correct for the page itself, but people's  
names aren't in English anyway. The only way to get it right is to ask the  
user both for the full name, given name and family name, something I  
haven't ever seen. The most practical solution is to not guess at all, and  
I don't know of any negative effects of this.

-- 
Philip J?genstedt
Core Developer
Opera Software

From mail at ciaranmcnulty.com  Tue Jul 20 04:05:56 2010
From: mail at ciaranmcnulty.com (Ciaran McNulty)
Date: Tue Jul 20 04:06:03 2010
Subject: [uf-discuss] re: HTML5 support
In-Reply-To: <AANLkTik1zqRB5suyS91V9ik_rvaEFi0pKiBSQHpFkw-i@mail.gmail.com>
References: <AANLkTinOxJu5JIgIAsUGi81F9juAYWBzQPid8EhGQx7K@mail.gmail.com>
	<1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry>
	<AANLkTikpstw7Uq_qimMTpGGWvn9YkvM89UIlB9B_c6uL@mail.gmail.com>
	<AANLkTinES5z3rhy-IEzpx1-w1BWVMgAQUopneU1H5Qxw@mail.gmail.com>
	<D2281DAD-9139-4313-8438-199E88C215E6@randomchaos.com>
	<AANLkTinD13jF-tpgfyuyd_V_h8zgTJrrvHAmuWte7MUT@mail.gmail.com>
	<3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com>
	<op.vf20mun3sr6mfa@philip-pc>
	<5DC1788B-59A0-45AE-8E88-4CF257DA642C@randomchaos.com>
	<AANLkTinI_nAP7x0suraed0uC_YsIFKqnAQVxBB4ikx_I@mail.gmail.com>
	<AANLkTik1zqRB5suyS91V9ik_rvaEFi0pKiBSQHpFkw-i@mail.gmail.com>
Message-ID: <AANLkTimrT21apZdm9scuh2-c-9r4RxN5UKsBn8BC-5DZ@mail.gmail.com>

On Tue, Jul 20, 2010 at 5:05 AM, Angelo Gladding <angelo@gladding.name> wrote:
> Can an enlightened soul describe in which ways microdata is actually
> superior to profiled poshformats?

To me it's not a question of Microdata vs POSH, it's more like
Microdata vs class attributes where both are methods that can be used
in POSH style data embedding.

The main arguments (and I present these without necessarily agreeing!)
seem to be:

1. Class is ingrained as a CSS hook mechanism. Most people on this
list are fine with class being used for other purposes, but despite
that the argument comes up incredibly often that using class is
somehow a 'hack'. Microdata overcomes that, so right or wrong, it may
be worth ditching class for embedded data just to help uptake.

2. The class space is already populated with lots of ill-thought-out
CSS identifiers. This means POSH formats have to attempt crude forms
of namespacing (e.g. picking a uniquely-named root element) to try and
not collide with existing markup. That works for @class="fn" say, but
it's easy to collide with @class="email". Microdata separates out the
important stuff.

3. Related to 2, microdata extraction is possible without having to be
profile-aware, so for instance microdata can be converted to JSON
without knowledge of the vocabulary used.

4. Microdata features some structures like @itemref that help combine
disparate data across a document into one Microdata element, which in
Microformats would need the slightly hacky rel-include structures that
frankly I don't think anyone has been completely happy with.

5. Microdata allows locally-scoped typing using the @itemtype property
and a URL, while a POSH format can only do something similar with a
document-level @profile.

6. Microdata defines an API for DOM access to Microdata that allows
scripts to deal with Microdata-embedded data when doing the same with
Microformats involves some fairly heavy DOM parsing.

The arguments against Microdata are basically that it's complex, huge,
obviously isn't based on any existent markup in the wild, and really
doesn't look like an obvious core element of HTML5 so it's weird that
it's included in the same spec.

-Ciaran
From philipj at opera.com  Tue Jul 20 05:07:49 2010
From: philipj at opera.com (=?iso-8859-15?Q?Philip_J=E4genstedt?=)
Date: Tue Jul 20 05:07:57 2010
Subject: [uf-discuss] re: HTML5 support
In-Reply-To: <AANLkTimrT21apZdm9scuh2-c-9r4RxN5UKsBn8BC-5DZ@mail.gmail.com>
References: <AANLkTinOxJu5JIgIAsUGi81F9juAYWBzQPid8EhGQx7K@mail.gmail.com>
	<1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry>
	<AANLkTikpstw7Uq_qimMTpGGWvn9YkvM89UIlB9B_c6uL@mail.gmail.com>
	<AANLkTinES5z3rhy-IEzpx1-w1BWVMgAQUopneU1H5Qxw@mail.gmail.com>
	<D2281DAD-9139-4313-8438-199E88C215E6@randomchaos.com>
	<AANLkTinD13jF-tpgfyuyd_V_h8zgTJrrvHAmuWte7MUT@mail.gmail.com>
	<3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com>
	<op.vf20mun3sr6mfa@philip-pc>
	<5DC1788B-59A0-45AE-8E88-4CF257DA642C@randomchaos.com>
	<AANLkTinI_nAP7x0suraed0uC_YsIFKqnAQVxBB4ikx_I@mail.gmail.com>
	<AANLkTik1zqRB5suyS91V9ik_rvaEFi0pKiBSQHpFkw-i@mail.gmail.com>
	<AANLkTimrT21apZdm9scuh2-c-9r4RxN5UKsBn8BC-5DZ@mail.gmail.com>
Message-ID: <op.vf45bb17atwj1d@philip-pc>

On Tue, 20 Jul 2010 13:05:56 +0200, Ciaran McNulty  
<mail@ciaranmcnulty.com> wrote:

> On Tue, Jul 20, 2010 at 5:05 AM, Angelo Gladding <angelo@gladding.name>  
> wrote:
>> Can an enlightened soul describe in which ways microdata is actually
>> superior to profiled poshformats?
>
> To me it's not a question of Microdata vs POSH, it's more like
> Microdata vs class attributes where both are methods that can be used
> in POSH style data embedding.
>
> The main arguments (and I present these without necessarily agreeing!)
> seem to be:
>
> 1. Class is ingrained as a CSS hook mechanism. Most people on this
> list are fine with class being used for other purposes, but despite
> that the argument comes up incredibly often that using class is
> somehow a 'hack'. Microdata overcomes that, so right or wrong, it may
> be worth ditching class for embedded data just to help uptake.
>
> 2. The class space is already populated with lots of ill-thought-out
> CSS identifiers. This means POSH formats have to attempt crude forms
> of namespacing (e.g. picking a uniquely-named root element) to try and
> not collide with existing markup. That works for @class="fn" say, but
> it's easy to collide with @class="email". Microdata separates out the
> important stuff.
>
> 3. Related to 2, microdata extraction is possible without having to be
> profile-aware, so for instance microdata can be converted to JSON
> without knowledge of the vocabulary used.
>
> 4. Microdata features some structures like @itemref that help combine
> disparate data across a document into one Microdata element, which in
> Microformats would need the slightly hacky rel-include structures that
> frankly I don't think anyone has been completely happy with.
>
> 5. Microdata allows locally-scoped typing using the @itemtype property
> and a URL, while a POSH format can only do something similar with a
> document-level @profile.
>
> 6. Microdata defines an API for DOM access to Microdata that allows
> scripts to deal with Microdata-embedded data when doing the same with
> Microformats involves some fairly heavy DOM parsing.

Well written. Unlike yourself, I agree with all of the above :)

> The arguments against Microdata are basically that it's complex, huge,
> obviously isn't based on any existent markup in the wild, and really
> doesn't look like an obvious core element of HTML5 so it's weird that
> it's included in the same spec.

Well, it's not in W3C's version of HTML5, they published it as a separate  
spec (which is strange, IMO). Regardless of what spec it is in, it still  
works just the same, so that's OK.

-- 
Philip J?genstedt
Core Developer
Opera Software

From mail at ciaranmcnulty.com  Tue Jul 20 05:57:09 2010
From: mail at ciaranmcnulty.com (Ciaran McNulty)
Date: Tue Jul 20 11:35:49 2010
Subject: [uf-discuss] re: HTML5 support
In-Reply-To: <op.vf45bb17atwj1d@philip-pc>
References: <AANLkTinOxJu5JIgIAsUGi81F9juAYWBzQPid8EhGQx7K@mail.gmail.com>
	<1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry>
	<AANLkTikpstw7Uq_qimMTpGGWvn9YkvM89UIlB9B_c6uL@mail.gmail.com>
	<AANLkTinES5z3rhy-IEzpx1-w1BWVMgAQUopneU1H5Qxw@mail.gmail.com>
	<D2281DAD-9139-4313-8438-199E88C215E6@randomchaos.com>
	<AANLkTinD13jF-tpgfyuyd_V_h8zgTJrrvHAmuWte7MUT@mail.gmail.com>
	<3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com>
	<op.vf20mun3sr6mfa@philip-pc>
	<5DC1788B-59A0-45AE-8E88-4CF257DA642C@randomchaos.com>
	<AANLkTinI_nAP7x0suraed0uC_YsIFKqnAQVxBB4ikx_I@mail.gmail.com>
	<AANLkTik1zqRB5suyS91V9ik_rvaEFi0pKiBSQHpFkw-i@mail.gmail.com>
	<AANLkTimrT21apZdm9scuh2-c-9r4RxN5UKsBn8BC-5DZ@mail.gmail.com>
	<op.vf45bb17atwj1d@philip-pc>
Message-ID: <AANLkTik7rWhSjeRkDJ0SYaPhyqcFYhtIQfZAbv2avQ5P@mail.gmail.com>

On Tue, Jul 20, 2010 at 1:07 PM, Philip J?genstedt <philipj@opera.com> wrote:
> Well, it's not in W3C's version of HTML5, they published it as a separate
> spec (which is strange, IMO). Regardless of what spec it is in, it still
> works just the same, so that's OK.

Oh, really? Sorry, I'm out of date in that case.

I think it's bundled together with 'HTML5' in the public consciousness anyhow.

-Ciaran

From angelo at gladding.name  Tue Jul 20 12:55:38 2010
From: angelo at gladding.name (Angelo Gladding)
Date: Tue Jul 20 12:55:51 2010
Subject: [uf-discuss] re: HTML5 support
In-Reply-To: <op.vf40j1b6sr6mfa@philip-pc>
References: <AANLkTinOxJu5JIgIAsUGi81F9juAYWBzQPid8EhGQx7K@mail.gmail.com>
	<1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry>
	<AANLkTikpstw7Uq_qimMTpGGWvn9YkvM89UIlB9B_c6uL@mail.gmail.com>
	<AANLkTinES5z3rhy-IEzpx1-w1BWVMgAQUopneU1H5Qxw@mail.gmail.com>
	<D2281DAD-9139-4313-8438-199E88C215E6@randomchaos.com>
	<AANLkTinD13jF-tpgfyuyd_V_h8zgTJrrvHAmuWte7MUT@mail.gmail.com>
	<3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com>
	<op.vf20mun3sr6mfa@philip-pc>
	<5DC1788B-59A0-45AE-8E88-4CF257DA642C@randomchaos.com>
	<AANLkTinI_nAP7x0suraed0uC_YsIFKqnAQVxBB4ikx_I@mail.gmail.com>
	<AANLkTik1zqRB5suyS91V9ik_rvaEFi0pKiBSQHpFkw-i@mail.gmail.com>
	<op.vf40j1b6sr6mfa@philip-pc>
Message-ID: <AANLkTilntXuAgco5NBeMtXhfmJqx7ZSF8_LSOLIEZ3o6@mail.gmail.com>

On Tue, Jul 20, 2010 at 3:25 AM, Philip J?genstedt <philipj@opera.com> wrote:
> On Tue, 20 Jul 2010 06:05:06 +0200, Angelo Gladding <angelo@gladding.name>
> wrote:
>
>> Can an enlightened soul describe in which ways microdata is actually
>> superior to profiled poshformats?
>
> Microdata should be compared to the class attributes and the various
> patterns that microformats use, not any specific vocabulary.

Of course. Let me clarify. A `microformat` is a poshformat that has
undergone a relatively laborious process of research and brainstorming
to capture real world user requirements to make a minimal vocabulary
that can capture ~80% of current usage patterns. Microdata is a set of
rules governing a syntax. Hence my comparison of microdata to
poshformats, which are essentially microformats sans the due
diligence.

> The main benefit is that parsing becomes well-defined

Ain't that the truth.

> and simple.

Or is it? I wonder how different the two sets of supporting algorithms
might look face to face once fully documented and implemented.

The Microformats wiki makes the following comparison to microdata:

1. `itemprop` - is a more specific version of class, for field names.
2. `subject` - allows semantically linking within the page.
Conceptually similar to the include-pattern.
3. `itemref` - allows including properties elsewhere on the page that
are not descendants of itemscope. Takes space-separated ids (for
example itemref="address phone" would include the elements with
id="address" and id="phone"). Conceptually similar to the
include-pattern.
4. `content` - on the meta element can be used to include invisible
data that is not part of the content. As current browsers move meta
inside <head>, make sure to include via `itemref`. Conceptually
similar to the 'value-title' feature of the value-class-pattern.
5. `itemscope` - identifies blocks to be marked as structured data.
Conceptually similar to the mfo brainstorming.
6. `itemtype` - to specify the type for an item (for example:
itemtype="http://microformats.org/profile/hcard").

Distilled down:

1. @class
2/3. include-pattern/table-header-pattern
4. value-class-pattern
5. "mfo"
6. rel-profile

Sounds to me like the same sort of desire for absolute normativity
that [non-HTML5] XHTML once attempted to burden the entirety of
humanity with. Ironically, HTML5 has deprecated such a style in favor
of a seemingly more flexible Microformat-esque syntax.

- - -

<span itemscope itemtype="http://microformats.org/profile/hcard">
      <span itemprop="fn n">
             <span itemprop="given-name">George</span>
             <span itemprop="family-name">Washington</span>
      </span>
</span>

vs

<span class=hcard>George Washington</span>

- - -

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
<title>example</title>
</head>
<body>
<p>example</p>
</body>
</html>

vs

<!doctype html>
<title>example</title>
<p>example

> That's why it's possible to define a JavaScript API for accessing microdata
> items on a page, which makes the data useful to the page itself, not only
> external scrapers. It also makes it feasible to make browser features like "add to
> address book" or "add to calendar",

Considering your affiliation with Opera, what might I ask are your
feelings about Operator?

> which really isn't really practical with microformats when the
> data is hidden in class attributes together with everything else.

As I alluded to above I see this as a complete non-issue yet you are
most certainly not the first to bring it up. What am I missing?

>> Might a "humans first, machines second" CJKV internationalization of
>> `n` optimization be to analyze the contents of the `fn`'s @lang and
>> inner text and use either or both to better determine name order?
>
> The main problem with this is that due to lazy copy-pasting, lang="en" is
> often used even when the language isn't English. Also, in the case of e.g.
> Facebook, lang="en" would be correct for the page itself, but people's names
> aren't in English anyway.

Check out http://ja-jp.facebook.com/people/gong-ye-zhong/100000456401743

<html lang=ja>...<div class=vcard>...<a class=fn ... >???</a>...</div>

?? can log in today and, without any cooperation from Facebook, append
a U+200B (zero-width space [1]) to his first name (regardless of the
input taking the form of one or two boxes), and immediately reap the
benefits of such an `n` optimization without negatively affecting UI,
sort order, etc.

[1] http://en.wikipedia.org/wiki/Zero-width_space

> The only way to get it right is to ask the user both for the full name,
> given name and family name, something I haven't ever seen.

If you haven't seen it, then it isn't even a single way to get it
right -- another
byproduct of Microformats philosophy I believe. However, if optimizations
 can yield 80%+ positive results when viewed in aggregate I personally give
 a little bit of magic a big thumbs up.

> The most practical solution is to not guess at all, and I don't know
> of any negative effects of this.

I just see a tiny hint of dehumanization. ;)

-- 
Angelo Gladding
angelo@gladding.name

From scott at randomchaos.com  Tue Jul 20 06:47:19 2010
From: scott at randomchaos.com (Scott Reynen)
Date: Tue Jul 20 13:33:13 2010
Subject: [uf-discuss] re: HTML5 support
In-Reply-To: <AANLkTinI_nAP7x0suraed0uC_YsIFKqnAQVxBB4ikx_I@mail.gmail.com>
References: <AANLkTinOxJu5JIgIAsUGi81F9juAYWBzQPid8EhGQx7K@mail.gmail.com>
	<1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry>
	<AANLkTikpstw7Uq_qimMTpGGWvn9YkvM89UIlB9B_c6uL@mail.gmail.com>
	<AANLkTinES5z3rhy-IEzpx1-w1BWVMgAQUopneU1H5Qxw@mail.gmail.com>
	<D2281DAD-9139-4313-8438-199E88C215E6@randomchaos.com>
	<AANLkTinD13jF-tpgfyuyd_V_h8zgTJrrvHAmuWte7MUT@mail.gmail.com>
	<3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com>
	<op.vf20mun3sr6mfa@philip-pc>
	<5DC1788B-59A0-45AE-8E88-4CF257DA642C@randomchaos.com>
	<AANLkTinI_nAP7x0suraed0uC_YsIFKqnAQVxBB4ikx_I@mail.gmail.com>
Message-ID: <35CD8166-D92D-4030-895A-141131120C4E@randomchaos.com>

On Jul 19, 2010, at 8:57 PM, Oli Studholme wrote:

>>> Microdata doesn't go out of its way to be compatible with existing RDF vocabularies
>> 
>> Maybe not specific vocabularies (that's kind of my point), but RDF itself is clearly a major consideration.  There's a whole section on it:
>> 
>> http://www.w3.org/TR/microdata/#rdf
> 
> No. There?s a sub-sub-section on converting to RDF, just as there are
> for converting to JSON and Atom. That?s not a design goal, it?s
> specified interoperability.

They're mentioned as "requirements" in the original announcement:

http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-May/019681.html

But again, the RDF syntax doesn't matter.  This is the important part for me:

"Distributed vocabulary development should be possible; it should not require coordination through a centralised system."

Distributed vocabulary development requires a general purpose solution.  Microformats don't have that requirement, so vocabulary-specific solutions are common.

>> I'd argue it is a bad idea in microdata, but not in microformats, because of the very distinction I'm trying to draw between the two.
> 
> As far as microdata goes it?s irrelevant ? that?s something decided by
> the *vocabulary* author.

I don't think that's really true, though, and I think this is exactly why n optimization was removed.  For every other microdata property, the value is determined by following the parsing rules in the microdata spec:

http://www.w3.org/TR/microdata/#values

With n optimization, undeclared properties are given values via a completely different parsing model.  This "magic"  may not be explicitly disallowed, but it doesn't really fit with the general design of microdata.

On Jul 19, 2010, at 10:05 PM, Angelo Gladding wrote:

> Could it be said that microdata intends to do to Microformat syntax
> what HTML5 did to HTML4 syntax rules in the sense that parsing is
> unambiguous and easier to validate normativity?

I'd say that's true as far as what they both do, but not how they do it.  HTML5 makes parsing unambiguous by describing a wide variety of parsing rules for invalid content.  I'd say such tolerance of human error is more in line with the microformats approach.

Microdata, on the other hand, simply changes the syntax to reduce the risk of invalid content.  So in terms of strategy for making parsing unambiguous, microdata looks more like XHTML to me.

On Jul 20, 2010, at 4:25 AM, Philip J?genstedt wrote:

> Microdata should be compared to the class attributes and the various patterns that microformats use, not any specific vocabulary.

Agreed!

> The main benefit is that parsing becomes well-defined and simple.

Right, a lot of it comes down to optimizing for parsers vs. optimizing for publishers.  HTML itself is familiar to publishers, but difficult to parse for data.  Microformats are limited to HTML to make things simpler for publishers at a cost to parsers.  Microdata extends HTML to make things simpler for parsers at a cost to publishers.  Of course, publishers and parsers need to work together, so these approaches only diverge so far.

Peace,
Scott
From singpolyma at singpolyma.net  Tue Jul 20 05:29:48 2010
From: singpolyma at singpolyma.net (Stephen Paul Weber)
Date: Tue Jul 20 14:11:10 2010
Subject: [uf-discuss] re: HTML5 support
In-Reply-To: <op.vf40j1b6sr6mfa@philip-pc>
References: <AANLkTinOxJu5JIgIAsUGi81F9juAYWBzQPid8EhGQx7K@mail.gmail.com>
	<1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry>
	<AANLkTikpstw7Uq_qimMTpGGWvn9YkvM89UIlB9B_c6uL@mail.gmail.com>
	<AANLkTinES5z3rhy-IEzpx1-w1BWVMgAQUopneU1H5Qxw@mail.gmail.com>
	<D2281DAD-9139-4313-8438-199E88C215E6@randomchaos.com>
	<AANLkTinD13jF-tpgfyuyd_V_h8zgTJrrvHAmuWte7MUT@mail.gmail.com>
	<3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com>
	<op.vf20mun3sr6mfa@philip-pc>
	<5DC1788B-59A0-45AE-8E88-4CF257DA642C@randomchaos.com>
	<AANLkTinI_nAP7x0suraed0uC_YsIFKqnAQVxBB4ikx_I@mail.gmail.com>
	<AANLkTik1zqRB5suyS91V9ik_rvaEFi0pKiBSQHpFkw-i@mail.gmail.com>
	<op.vf40j1b6sr6mfa@philip-pc>
Message-ID: <1279628988.17280.2.camel@singpolyma-N900>

> On Tue, 20 Jul 2010 06:05:06 +0200, Angelo Gladding
> <angelo@gladding.name>? ?  wrote:
> 
> > Can an enlightened soul describe in which ways microdata is actually
> > superior to profiled poshformats?
> 
> Microdata should be compared to the class attributes and the various?  
> patterns that microformats use, not any specific vocabulary. The main?  
> benefit is that parsing becomes well-defined and simple. That's why it's
>? ?  possible to define a JavaScript API for accessing microdata items on a
>? ?  page, which makes the data useful to the page itself, not only
> external? ?  scrapers. It also makes it feasible to make browser features
> like "add to? ?  address book" or "add to calendar", which really isn't
> really practical? ?  with microformats when the data is hidden in class
> attributes together? ?  with everything else.

Microformats data is not "hidden".  Microformats are just well-done vocabulary specifications using the semantics of HTML.  Is one of thlse semantics @class? Absolutely.  It is by no means a primary or most important one.

One of the benefits of using the real semantics of the page, and not some  hacked-in layer like microdata, is that it works well with existing tools and markup.  CSS styling of microformats, for example, "just works" and I use it all the time.  DOM access similarly works well.

Having written significant code both in-browser and out to parse microformats, I find the claim that parsing them using the DOM is "not practical" shocking.  What would you prefer?  Microformats psrsers are usually very easy to write precisely because they use the page's existing semantics, and thus are easily exposed to the tools used for all DOMscripting (including, but not limited to, selecting elements by class).

Then again, I'm very biased.  Microdata, like other superfluous parts of HTML5 (up there with audio and video tags) just makes me sad.  Too much NIH 
From mail at tobyinkster.co.uk  Wed Jul 21 02:09:22 2010
From: mail at tobyinkster.co.uk (Toby Inkster)
Date: Wed Jul 21 02:09:59 2010
Subject: [uf-discuss] re: HTML5 support
In-Reply-To: <1279628988.17280.2.camel@singpolyma-N900>
References: <AANLkTinOxJu5JIgIAsUGi81F9juAYWBzQPid8EhGQx7K@mail.gmail.com>
	<1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry>
	<AANLkTikpstw7Uq_qimMTpGGWvn9YkvM89UIlB9B_c6uL@mail.gmail.com>
	<AANLkTinES5z3rhy-IEzpx1-w1BWVMgAQUopneU1H5Qxw@mail.gmail.com>
	<D2281DAD-9139-4313-8438-199E88C215E6@randomchaos.com>
	<AANLkTinD13jF-tpgfyuyd_V_h8zgTJrrvHAmuWte7MUT@mail.gmail.com>
	<3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com>
	<op.vf20mun3sr6mfa@philip-pc>
	<5DC1788B-59A0-45AE-8E88-4CF257DA642C@randomchaos.com>
	<AANLkTinI_nAP7x0suraed0uC_YsIFKqnAQVxBB4ikx_I@mail.gmail.com>
	<AANLkTik1zqRB5suyS91V9ik_rvaEFi0pKiBSQHpFkw-i@mail.gmail.com>
	<op.vf40j1b6sr6mfa@philip-pc>
	<1279628988.17280.2.camel@singpolyma-N900>
Message-ID: <20100721100922.1521c725@miranda.g5n.co.uk>

On Tue, 20 Jul 2010 08:29:48 -0400
Stephen Paul Weber <singpolyma@singpolyma.net> wrote:

> Having written significant code both in-browser and out to parse
> microformats, I find the claim that parsing them using the DOM is
> "not practical" shocking.  What would you prefer?

Parsing microformats via the DOM is not practical. Parsing them any
other way is even worse though.

While writing DOM code to parse a particular site's implementation of
say, hCard, is pretty trivial, generalising that to support all the
variations of how hCard is marked up in the wild is a lot of work.

As a comparison, I have written Perl parsers[*] for microformats, RDFa
and Microdata. Here are the lines-of-code counts for each, excluding
documentation, comments and blank lines:

Microdata      :  945
RDFa 1.0       : 1265
RDFa 1.1 [**]  : 2611
microformats   : 9455

*  = See <http://search.cpan.org/~tobyink/>.
** = this code actually handles both RDFa 1.0 and 1.1. Whatsmore it can
     handle them embedded in Atom, SVG and OpenDocument Format; not
     just (X)HTML. A pure RDFa-1.1-in-(X)HTML parser could probably be
     written in under 1000 lines of Perl.

The amount of code needed to parse microformats is clearly different
from the other formats.

Another difference is that the Microdata and RDFa 1.0 implementations
can be considered more-or-less complete. (The RDFa 1.1 working drafts
are still somewhat is flux, so the implementation no doubt still needs
changes.) If somebody comes up tomorrow with a new RDFa or Microdata
vocabulary for describing cows, or bread makers, or train timetables,
it will work out of the box. For microformats, that's not the case -
code needs to be written.

So you end up with a chicken-and-egg situation with nobody implementing
tools for a new draft microformat because it's not used in the wild;
nobody using it in the wild because of a lack of tool support; and the
microformat never progressing beyond draft status because of lack of
implementation experience, and uncertainty about how it might work in
the wild. That's why we haven't had any of the draft microformats on
the wiki move out of draft status in the last four years or so; or at
least it's a major contributory factor.

-- 
Toby A Inkster
<mailto:mail@tobyinkster.co.uk>
<http://tobyinkster.co.uk>

From philipj at opera.com  Wed Jul 21 02:27:44 2010
From: philipj at opera.com (=?utf-8?Q?Philip_J=C3=A4genstedt?=)
Date: Wed Jul 21 02:28:01 2010
Subject: [uf-discuss] re: HTML5 support
In-Reply-To: <AANLkTilntXuAgco5NBeMtXhfmJqx7ZSF8_LSOLIEZ3o6@mail.gmail.com>
References: <AANLkTinOxJu5JIgIAsUGi81F9juAYWBzQPid8EhGQx7K@mail.gmail.com>
	<1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry>
	<AANLkTikpstw7Uq_qimMTpGGWvn9YkvM89UIlB9B_c6uL@mail.gmail.com>
	<AANLkTinES5z3rhy-IEzpx1-w1BWVMgAQUopneU1H5Qxw@mail.gmail.com>
	<D2281DAD-9139-4313-8438-199E88C215E6@randomchaos.com>
	<AANLkTinD13jF-tpgfyuyd_V_h8zgTJrrvHAmuWte7MUT@mail.gmail.com>
	<3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com>
	<op.vf20mun3sr6mfa@philip-pc>
	<5DC1788B-59A0-45AE-8E88-4CF257DA642C@randomchaos.com>
	<AANLkTinI_nAP7x0suraed0uC_YsIFKqnAQVxBB4ikx_I@mail.gmail.com>
	<AANLkTik1zqRB5suyS91V9ik_rvaEFi0pKiBSQHpFkw-i@mail.gmail.com>
	<op.vf40j1b6sr6mfa@philip-pc>
	<AANLkTilntXuAgco5NBeMtXhfmJqx7ZSF8_LSOLIEZ3o6@mail.gmail.com>
Message-ID: <op.vf6skil5sr6mfa@philip-pc>

On Tue, 20 Jul 2010 21:55:38 +0200, Angelo Gladding <angelo@gladding.name>  
wrote:

> On Tue, Jul 20, 2010 at 3:25 AM, Philip J?genstedt <philipj@opera.com>  
> wrote:
>> On Tue, 20 Jul 2010 06:05:06 +0200, Angelo Gladding  
>> <angelo@gladding.name>
>> wrote:
>>
>>> Can an enlightened soul describe in which ways microdata is actually
>>> superior to profiled poshformats?
>>
>> Microdata should be compared to the class attributes and the various
>> patterns that microformats use, not any specific vocabulary.
>
> Of course. Let me clarify. A `microformat` is a poshformat that has
> undergone a relatively laborious process of research and brainstorming
> to capture real world user requirements to make a minimal vocabulary
> that can capture ~80% of current usage patterns. Microdata is a set of
> rules governing a syntax. Hence my comparison of microdata to
> poshformats, which are essentially microformats sans the due
> diligence.

Right, designing vocabularies is hard and requires due diligence. That's  
true no matter what the syntax is.

>> The main benefit is that parsing becomes well-defined
>
> Ain't that the truth.
>
>> and simple.
>
> Or is it? I wonder how different the two sets of supporting algorithms
> might look face to face once fully documented and implemented.
>
> The Microformats wiki makes the following comparison to microdata:
>
> 1. `itemprop` - is a more specific version of class, for field names.
> 2. `subject` - allows semantically linking within the page.
> Conceptually similar to the include-pattern.
> 3. `itemref` - allows including properties elsewhere on the page that
> are not descendants of itemscope. Takes space-separated ids (for
> example itemref="address phone" would include the elements with
> id="address" and id="phone"). Conceptually similar to the
> include-pattern.
> 4. `content` - on the meta element can be used to include invisible
> data that is not part of the content. As current browsers move meta
> inside <head>, make sure to include via `itemref`. Conceptually
> similar to the 'value-title' feature of the value-class-pattern.
> 5. `itemscope` - identifies blocks to be marked as structured data.
> Conceptually similar to the mfo brainstorming.
> 6. `itemtype` - to specify the type for an item (for example:
> itemtype="http://microformats.org/profile/hcard").

What wiki page is this from? subject has been replaced by itemid. I can't  
understand what the similary with the include-pattern could possibly be,  
though.

> Distilled down:
>
> 1. @class
> 2/3. include-pattern/table-header-pattern
> 4. value-class-pattern
> 5. "mfo"
> 6. rel-profile
>
> Sounds to me like the same sort of desire for absolute normativity
> that [non-HTML5] XHTML once attempted to burden the entirety of
> humanity with. Ironically, HTML5 has deprecated such a style in favor
> of a seemingly more flexible Microformat-esque syntax.

Putting XHTML2 aside, one of the main achievements of HTML5 is having  
formalized how to parse all the sloppy, broken HTML out there (a.k.a. "tag  
soup"). While the syntax is flexible to authors, there's no flexibility  
whatsoever for an implementor how to parse it. The result will always be  
the same. In my view, microdata is to microformats what the HTML5 parser  
is to HTML4. It makes it possible to parse, without ever guessing, all the  
microdata items on a page. While it's really easy to write a microformat  
parser in JavaScript, you're not going to see that built into a browser,  
where each vocabulary needs a new parser. Microdata also hasn't been  
implemented by any browser yet, but I'm pretty sure it's going to happen  
if it takes off.

> <span itemscope itemtype="http://microformats.org/profile/hcard">

> Considering your affiliation with Opera, what might I ask are your
> feelings about Operator?

I've heard of it before, it looks like a custom Opera distribution? It has  
nothing to do with microformats or microdata as far as I can tell.

>> which really isn't really practical with microformats when the
>> data is hidden in class attributes together with everything else.
>
> As I alluded to above I see this as a complete non-issue yet you are
> most certainly not the first to bring it up. What am I missing?

If a browser is going to support some kind of embedded data vocabularies  
(like events or contacts), the code for parsing it isn't going to be  
written in JavaScript using the DOM, it's going to be in C++ or C  
operating on the internal datastructures of the browser. To support a  
specific microformat vocabulary, one would have to look through all the  
classes on all elements to find the "root" element, then speculatively  
search its children for the other structures of the microformat. Given  
that the all of the constructs used in microformats are also used for  
completely different things, so most of the data you inspect isn't  
actually going to be what you're looking for. Since one has to do this for  
all documents parsed (and not "on demand" like when finding a particular  
class using document.getElementsByClassName) my guess is that it's going  
to be slow. What's worse, you'll have to write more or this complicated,  
slow code for each vocabulary you want to support.

If the data is put in new attributes like itemprop, the code for parsing  
it will be simpler and you won't have to write it again for every  
vocabulary support, you can just reuse your getItems(x) implementation to  
find all items of type x and go from there.

Now, this is all theoretical since no browser has implemented this yet (I  
tried a bit on my free time, but had too little). If you don't care about  
browsers, then of course it doesn't matter. If microformats work for you  
then keep using them. I'm just saying that there's a better way forward.

>>> Might a "humans first, machines second" CJKV internationalization of
>>> `n` optimization be to analyze the contents of the `fn`'s @lang and
>>> inner text and use either or both to better determine name order?
>>
>> The main problem with this is that due to lazy copy-pasting, lang="en"  
>> is
>> often used even when the language isn't English. Also, in the case of  
>> e.g.
>> Facebook, lang="en" would be correct for the page itself, but people's  
>> names
>> aren't in English anyway.
>
> Check out http://ja-jp.facebook.com/people/gong-ye-zhong/100000456401743
>
> <html lang=ja>...<div class=vcard>...<a class=fn ... >???</a>...</div>
>
> ?? can log in today and, without any cooperation from Facebook, append
> a U+200B (zero-width space [1]) to his first name (regardless of the
> input taking the form of one or two boxes), and immediately reap the
> benefits of such an `n` optimization without negatively affecting UI,
> sort order, etc.
>
> [1] http://en.wikipedia.org/wiki/Zero-width_space

I don't speak Japanese, but I think ?? is the family name and ? is the  
given name. By not doing anything the 'n' optimization will incorrectly  
guess that the family name is ??? and given name unknown. By inserting  
a zero-width space, it will instead incorrectly guess that ?? is the  
given name and ? is the family name. Either way it's incorrect.

>> The only way to get it right is to ask the user both for the full name,
>> given name and family name, something I haven't ever seen.
>
> If you haven't seen it, then it isn't even a single way to get it
> right -- another
> byproduct of Microformats philosophy I believe. However, if optimizations
>  can yield 80%+ positive results when viewed in aggregate I personally  
> give
>  a little bit of magic a big thumbs up.

I guess we're not going by the population of the earth then, since China,  
Japan, Vietnam and South Korea account for 23.36% of it.  
(http://en.wikipedia.org/wiki/List_of_countries_by_population)

>> The most practical solution is to not guess at all, and I don't know
>> of any negative effects of this.
>
> I just see a tiny hint of dehumanization. ;)

Seriously though, what are the negative effects? I'm betting that the  
number of people that make good use of having the given name and family  
name separately in their address book aren't many enough to justify  
screwing it up for the population of East Asia.

-- 
Philip J?genstedt
Core Developer
Opera Software

From philipj at opera.com  Wed Jul 21 02:43:53 2010
From: philipj at opera.com (=?utf-8?Q?Philip_J=C3=A4genstedt?=)
Date: Wed Jul 21 03:18:39 2010
Subject: [uf-discuss] re: HTML5 support
In-Reply-To: <35CD8166-D92D-4030-895A-141131120C4E@randomchaos.com>
References: <AANLkTinOxJu5JIgIAsUGi81F9juAYWBzQPid8EhGQx7K@mail.gmail.com>
	<1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry>
	<AANLkTikpstw7Uq_qimMTpGGWvn9YkvM89UIlB9B_c6uL@mail.gmail.com>
	<AANLkTinES5z3rhy-IEzpx1-w1BWVMgAQUopneU1H5Qxw@mail.gmail.com>
	<D2281DAD-9139-4313-8438-199E88C215E6@randomchaos.com>
	<AANLkTinD13jF-tpgfyuyd_V_h8zgTJrrvHAmuWte7MUT@mail.gmail.com>
	<3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com>
	<op.vf20mun3sr6mfa@philip-pc>
	<5DC1788B-59A0-45AE-8E88-4CF257DA642C@randomchaos.com>
	<AANLkTinI_nAP7x0suraed0uC_YsIFKqnAQVxBB4ikx_I@mail.gmail.com>
	<35CD8166-D92D-4030-895A-141131120C4E@randomchaos.com>
Message-ID: <op.vf6tbfx3sr6mfa@philip-pc>

On Tue, 20 Jul 2010 15:47:19 +0200, Scott Reynen <scott@randomchaos.com>  
wrote:

> On Jul 19, 2010, at 8:57 PM, Oli Studholme wrote:
>
>>>> Microdata doesn't go out of its way to be compatible with existing  
>>>> RDF vocabularies
>>>
>>> Maybe not specific vocabularies (that's kind of my point), but RDF  
>>> itself is clearly a major consideration.  There's a whole section on  
>>> it:
>>>
>>> http://www.w3.org/TR/microdata/#rdf
>>
>> No. There?s a sub-sub-section on converting to RDF, just as there are
>> for converting to JSON and Atom. That?s not a design goal, it?s
>> specified interoperability.
>
> They're mentioned as "requirements" in the original announcement:
>
> http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-May/019681.html
>
> But again, the RDF syntax doesn't matter.  This is the important part  
> for me:
>
> "Distributed vocabulary development should be possible; it should not  
> require coordination through a centralised system."
>
> Distributed vocabulary development requires a general purpose solution.   
> Microformats don't have that requirement, so vocabulary-specific  
> solutions are common.

Yes, which is why general purpose parsers cannot exist, and why browser  
support is unlikely.

>>> I'd argue it is a bad idea in microdata, but not in microformats,  
>>> because of the very distinction I'm trying to draw between the two.
>>
>> As far as microdata goes it?s irrelevant ? that?s something decided by
>> the *vocabulary* author.
>
> I don't think that's really true, though, and I think this is exactly  
> why n optimization was removed.  For every other microdata property, the  
> value is determined by following the parsing rules in the microdata spec:
>
> http://www.w3.org/TR/microdata/#values
>
> With n optimization, undeclared properties are given values via a  
> completely different parsing model.  This "magic"  may not be explicitly  
> disallowed, but it doesn't really fit with the general design of  
> microdata.

The magic was in the vCard extraction algorithm:  
<http://www.whatwg.org/specs/web-apps/current-work/multipage/links.html#conversion-to-vcard>

The DOM isn't changed, that would indeed be a very bad fit with the  
overall design.

> On Jul 19, 2010, at 10:05 PM, Angelo Gladding wrote:
>
>> Could it be said that microdata intends to do to Microformat syntax
>> what HTML5 did to HTML4 syntax rules in the sense that parsing is
>> unambiguous and easier to validate normativity?
>
> I'd say that's true as far as what they both do, but not how they do  
> it.  HTML5 makes parsing unambiguous by describing a wide variety of  
> parsing rules for invalid content.  I'd say such tolerance of human  
> error is more in line with the microformats approach.
>
> Microdata, on the other hand, simply changes the syntax to reduce the  
> risk of invalid content.  So in terms of strategy for making parsing  
> unambiguous, microdata looks more like XHTML to me.

HTML5 parsing is also unambiguous. The only reason it's so ridiculously  
complex is because it's needed to parse real markup on the web. With  
microdata there was no existing content, so it's possible to make a more  
sane definition. But of course, some parts may be too strict and I've  
previously left feedback and had gotten the spec changed due to this. If  
there are more things which are unnecessarily strict and makes it  
difficult for authors, please do send mail to the WHATWG or W3C so that it  
can be fixed.

-- 
Philip J?genstedt
Core Developer
Opera Software

From singpolyma at singpolyma.net  Wed Jul 21 06:46:08 2010
From: singpolyma at singpolyma.net (Stephen Paul Weber)
Date: Wed Jul 21 06:46:35 2010
Subject: [uf-discuss] re: HTML5 support
In-Reply-To: <op.vf6tbfx3sr6mfa@philip-pc>
References: <AANLkTikpstw7Uq_qimMTpGGWvn9YkvM89UIlB9B_c6uL@mail.gmail.com>
	<AANLkTinES5z3rhy-IEzpx1-w1BWVMgAQUopneU1H5Qxw@mail.gmail.com>
	<D2281DAD-9139-4313-8438-199E88C215E6@randomchaos.com>
	<AANLkTinD13jF-tpgfyuyd_V_h8zgTJrrvHAmuWte7MUT@mail.gmail.com>
	<3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com>
	<op.vf20mun3sr6mfa@philip-pc>
	<5DC1788B-59A0-45AE-8E88-4CF257DA642C@randomchaos.com>
	<AANLkTinI_nAP7x0suraed0uC_YsIFKqnAQVxBB4ikx_I@mail.gmail.com>
	<35CD8166-D92D-4030-895A-141131120C4E@randomchaos.com>
	<op.vf6tbfx3sr6mfa@philip-pc>
Message-ID: <20100721134608.GA1496@singpolyma-svelti>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Somebody claiming to be Philip J?genstedt wrote:
> On Tue, 20 Jul 2010 15:47:19 +0200, Scott Reynen
> <scott@randomchaos.com> wrote:
> 
> >Distributed vocabulary development requires a general purpose
> >solution.  Microformats don't have that requirement, so
> >vocabulary-specific solutions are common.
> 
> Yes, which is why general purpose parsers cannot exist, and why
> browser support is unlikely.

I'm pretty sure Firefox already supports ?fs...

- -- 
Stephen Paul Weber, @singpolyma
See <http://singpolyma.net> for how I prefer to be contacted
edition right joseph
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)

iQIcBAEBCAAGBQJMRvogAAoJENEcKRHOUZzekwMQALfvKvcVsCiFQbUEwIBLqMDe
qutM1KNYLrF036gumqyoBliK59qzBzuxWGLhbgEBqF5lLaqWPKolU5Dd3EzpW6HV
uYGpPrdw5L65L7NNUBNlrEfMkA1sa/EnF57at+/kcWhJSN5DG1uMJv5C9/pqdr4n
Zcw53uUb+NP9FY75zEL1jgjeQFR5s1pIkBkx1gjipcPmvDQ7TZ8VQ+li0Rpja4ON
T0jLLJ3qQVvmNmV1xrB6wI9fzopZ5LJycvfZaRONO7hPes1MIEuZWUiKFKho+h/4
Z1pY/twwCHI7VnnY7gbBh3U08ni1iYaaTbkphV153uxjRWSoBz0a8RxJ7U+StO6h
dFX0WKt7GY+9kVbQiymvxB6fwUaiEJO5sUZQ4xpesXhwqcfRnwbFipzm4veVIqAb
TfYdakiMkovKl5fAD1q671hJ82zfdI2PW2V8vPEWPc45yjasZMG59jHecCoFirFP
Ir29bk2mEJOuce+zvboRod5yINuEXTzShv86dZyi9oFFLO3TQxQezXev+SGnd7lI
LH6xbkYnfdSmTKjHK2v+edciIKt1z+B9ahe7YQxBWOlzcTpUXb6xTIspbIboc/0v
CeRdKaTlPkzsfHqbs66/LSHIekippH4m4/7sB0ZICjCDjkQgElrhewGmOjYuxXes
E3i3A4nfX9G5DxYl6asX
=JAD3
-----END PGP SIGNATURE-----
From singpolyma at singpolyma.net  Wed Jul 21 07:07:06 2010
From: singpolyma at singpolyma.net (Stephen Paul Weber)
Date: Wed Jul 21 07:07:15 2010
Subject: [uf-discuss] re: HTML5 support
In-Reply-To: <20100721100922.1521c725@miranda.g5n.co.uk>
References: <D2281DAD-9139-4313-8438-199E88C215E6@randomchaos.com>
	<AANLkTinD13jF-tpgfyuyd_V_h8zgTJrrvHAmuWte7MUT@mail.gmail.com>
	<3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com>
	<op.vf20mun3sr6mfa@philip-pc>
	<5DC1788B-59A0-45AE-8E88-4CF257DA642C@randomchaos.com>
	<AANLkTinI_nAP7x0suraed0uC_YsIFKqnAQVxBB4ikx_I@mail.gmail.com>
	<AANLkTik1zqRB5suyS91V9ik_rvaEFi0pKiBSQHpFkw-i@mail.gmail.com>
	<op.vf40j1b6sr6mfa@philip-pc>
	<1279628988.17280.2.camel@singpolyma-N900>
	<20100721100922.1521c725@miranda.g5n.co.uk>
Message-ID: <20100721140706.GB1496@singpolyma-svelti>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Somebody claiming to be Toby Inkster wrote:
> On Tue, 20 Jul 2010 08:29:48 -0400
> Stephen Paul Weber <singpolyma@singpolyma.net> wrote:
> 
> > Having written significant code both in-browser and out to parse
> > microformats, I find the claim that parsing them using the DOM is
> > "not practical" shocking.  What would you prefer?
> 
> Parsing microformats via the DOM is not practical. Parsing them any
> other way is even worse though.
> 
> While writing DOM code to parse a particular site's implementation of
> say, hCard, is pretty trivial, generalising that to support all the
> variations of how hCard is marked up in the wild is a lot of work.
> 
> As a comparison, I have written Perl parsers[*] for microformats, RDFa
> and Microdata. Here are the lines-of-code counts for each, excluding
> documentation, comments and blank lines:
> 
> The amount of code needed to parse microformats is clearly different
> from the other formats.

Sure, but you're comparing apples and oranges.  RDF and microdata are more
like JSON and XML: popular but useless by themselves.  They're just generic
containers.  So, yes, you can trivially parse out the KVPs they encode, but
you have no idea what those are, what they mean, what the relationships
between them are, nothing.  So you would have to write more code to
implement each specific vocabulary you were interested in, and do useful
stuff with it.  The microformats parsers, because they're parsing an actual
vocabulary instead of a container format, yes there will be some more code,
because both steps are happening at once.

The data you get out is actually the data you want, that makes sense, though.
When I want profile data, I write an hCard parser and grab it.  The same
deal with microdata would normally be done with a seperate "generic" parser
and then the code to throw out all vocabularies I don't want, and then the
one to massage into an internal data format that I want the vocabularies
that I do.

- -- 
Stephen Paul Weber, @singpolyma
See <http://singpolyma.net> for how I prefer to be contacted
edition right joseph
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)

iQIcBAEBCAAGBQJMRv8JAAoJENEcKRHOUZze7lYP/A9AD+Vnwy2mEM+zOB7QITFc
FlrVzGksiOnIyPtKIXgMG8Sm8doPRrG8JC0RtCA7V3BhVmNR8dry+5A8PCCpLOyl
8CUym6G10RYduQQ0rdQCYMB6E37BgAq3Vl9oi9xUSZwsbJepEdIrSeifUZnbYtA0
ZMD/ADmLBYyqeHUf1/0So/m7W4vxtki7eUX0i95YgW997AFntKYZBfY2gtOTvvur
Cx53jMWGkZdNgvGg/Mc9eyR011bPec7RtDkbYJJoUaVCiezxk1wFhzR6lLgcoRyB
ZM4zEIBAOGS3UrT+pchX6OYGpL/3JGdCFdUkFPLbQlH1lOO1X1brogS3rJRDIyGk
X1DQu0Md0b03vzw/wW5tIs93TCN2uGjiwXjC4ytFY7wuk9K9vwtZQQL6O8a9dJTf
9QFdGopQvn5YIFbVK/3p+9lPJUmu4+BljEDSVtQYzT0RA3b/qXvgJmqOzYBau9Eo
2YczFkjF69y3llaX5zAoOmQHhD1uKYjZUbOj+8fHZSKccPSwZXuXnR+sSrWlm3nR
Hr81QftUoO3IztBqargQVXbDiW+f+BItb1xPm343sxiFSVfXDFtcUp2kaEvF39no
LAG/XPnLDhV9FtDTwXwbhbfBQ4dCxRxQIkwfD8Jf5uFVLyWfpyB3+90yEdPVjhnO
wb76GF2GtcZiGY/5J/AN
=ORD1
-----END PGP SIGNATURE-----
From philipj at opera.com  Wed Jul 21 07:33:08 2010
From: philipj at opera.com (=?iso-8859-15?Q?Philip_J=E4genstedt?=)
Date: Wed Jul 21 07:33:26 2010
Subject: [uf-discuss] re: HTML5 support
In-Reply-To: <20100721134608.GA1496@singpolyma-svelti>
References: <AANLkTikpstw7Uq_qimMTpGGWvn9YkvM89UIlB9B_c6uL@mail.gmail.com>
	<AANLkTinES5z3rhy-IEzpx1-w1BWVMgAQUopneU1H5Qxw@mail.gmail.com>
	<D2281DAD-9139-4313-8438-199E88C215E6@randomchaos.com>
	<AANLkTinD13jF-tpgfyuyd_V_h8zgTJrrvHAmuWte7MUT@mail.gmail.com>
	<3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com>
	<op.vf20mun3sr6mfa@philip-pc>
	<5DC1788B-59A0-45AE-8E88-4CF257DA642C@randomchaos.com>
	<AANLkTinI_nAP7x0suraed0uC_YsIFKqnAQVxBB4ikx_I@mail.gmail.com>
	<35CD8166-D92D-4030-895A-141131120C4E@randomchaos.com>
	<op.vf6tbfx3sr6mfa@philip-pc> <20100721134608.GA1496@singpolyma-svelti>
Message-ID: <op.vf66pib8atwj1d@philip-pc.gothenburg.osa>

On Wed, 21 Jul 2010 15:46:08 +0200, Stephen Paul Weber  
<singpolyma@singpolyma.net> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> Somebody claiming to be Philip J?genstedt wrote:
>> On Tue, 20 Jul 2010 15:47:19 +0200, Scott Reynen
>> <scott@randomchaos.com> wrote:
>>
>> >Distributed vocabulary development requires a general purpose
>> >solution.  Microformats don't have that requirement, so
>> >vocabulary-specific solutions are common.
>>
>> Yes, which is why general purpose parsers cannot exist, and why
>> browser support is unlikely.
>
> I'm pretty sure Firefox already supports ?fs...

Are you sure it's not a plugin? If not, I'd be very interested to see it  
in action.

-- 
Philip J?genstedt
Core Developer
Opera Software

From info at csarven.ca  Wed Jul 21 09:04:42 2010
From: info at csarven.ca (Sarven Capadisli)
Date: Wed Jul 21 09:04:51 2010
Subject: [uf-discuss] re: HTML5 support
In-Reply-To: <op.vf66pib8atwj1d@philip-pc.gothenburg.osa>
References: <AANLkTikpstw7Uq_qimMTpGGWvn9YkvM89UIlB9B_c6uL@mail.gmail.com>
	<AANLkTinES5z3rhy-IEzpx1-w1BWVMgAQUopneU1H5Qxw@mail.gmail.com>
	<D2281DAD-9139-4313-8438-199E88C215E6@randomchaos.com>
	<AANLkTinD13jF-tpgfyuyd_V_h8zgTJrrvHAmuWte7MUT@mail.gmail.com>
	<3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com>
	<op.vf20mun3sr6mfa@philip-pc>
	<5DC1788B-59A0-45AE-8E88-4CF257DA642C@randomchaos.com>
	<AANLkTinI_nAP7x0suraed0uC_YsIFKqnAQVxBB4ikx_I@mail.gmail.com>
	<35CD8166-D92D-4030-895A-141131120C4E@randomchaos.com>
	<op.vf6tbfx3sr6mfa@philip-pc> <20100721134608.GA1496@singpolyma-svelti>
	<op.vf66pib8atwj1d@philip-pc.gothenburg.osa>
Message-ID: <1279728282.1873.167.camel@csarven-laptop>

On Wed, 2010-07-21 at 16:33 +0200, Philip J?genstedt wrote:
> On Wed, 21 Jul 2010 15:46:08 +0200, Stephen Paul Weber  
> <singpolyma@singpolyma.net> wrote:
> 
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA256
> >
> > Somebody claiming to be Philip J?genstedt wrote:
> >> On Tue, 20 Jul 2010 15:47:19 +0200, Scott Reynen
> >> <scott@randomchaos.com> wrote:
> >>
> >> >Distributed vocabulary development requires a general purpose
> >> >solution.  Microformats don't have that requirement, so
> >> >vocabulary-specific solutions are common.
> >>
> >> Yes, which is why general purpose parsers cannot exist, and why
> >> browser support is unlikely.
> >
> > I'm pretty sure Firefox already supports ?fs...
> 
> Are you sure it's not a plugin? If not, I'd be very interested to see it  
> in action.
> 

It has some support. See also resource://gre/modules/Microformats.js and
https://developer.mozilla.org/en/Using_microformats

Probably the best way to see it in action is via JetPack:
https://jetpack.mozillalabs.com/

-Sarven


From microformats.org at boblet.net  Thu Jul 22 07:20:20 2010
From: microformats.org at boblet.net (Oli Studholme)
Date: Thu Jul 22 07:20:52 2010
Subject: [uf-discuss] re: HTML5 support
In-Reply-To: <op.vf6skil5sr6mfa@philip-pc>
References: <AANLkTinOxJu5JIgIAsUGi81F9juAYWBzQPid8EhGQx7K@mail.gmail.com> 
	<1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry>
	<AANLkTikpstw7Uq_qimMTpGGWvn9YkvM89UIlB9B_c6uL@mail.gmail.com> 
	<AANLkTinES5z3rhy-IEzpx1-w1BWVMgAQUopneU1H5Qxw@mail.gmail.com> 
	<D2281DAD-9139-4313-8438-199E88C215E6@randomchaos.com>
	<AANLkTinD13jF-tpgfyuyd_V_h8zgTJrrvHAmuWte7MUT@mail.gmail.com> 
	<3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com>
	<op.vf20mun3sr6mfa@philip-pc> 
	<5DC1788B-59A0-45AE-8E88-4CF257DA642C@randomchaos.com>
	<AANLkTinI_nAP7x0suraed0uC_YsIFKqnAQVxBB4ikx_I@mail.gmail.com> 
	<AANLkTik1zqRB5suyS91V9ik_rvaEFi0pKiBSQHpFkw-i@mail.gmail.com> 
	<op.vf40j1b6sr6mfa@philip-pc>
	<AANLkTilntXuAgco5NBeMtXhfmJqx7ZSF8_LSOLIEZ3o6@mail.gmail.com> 
	<op.vf6skil5sr6mfa@philip-pc>
Message-ID: <AANLkTilXL65fsNcOAdMDRDLe_MjJnb2qTWIZxTwrqAna@mail.gmail.com>

Hey All,

Wow, this has turned into a really interesting thread. Thank you all
for your input.

I just want to address a couple of points? ;)

On Wed, Jul 21, 2010 at 6:27 PM, Philip J?genstedt <philipj@opera.com> wrote:
>>>
>>> The main problem with this is that due to lazy copy-pasting, lang="en" is
>>> often used even when the language isn't English. Also, in the case of
>>> e.g.
>>> Facebook, lang="en" would be correct for the page itself, but people's
>>> names
>>> aren't in English anyway.
>>
>> Check out http://ja-jp.facebook.com/people/gong-ye-zhong/100000456401743
>> <html lang=ja>...<div class=vcard>...<a class=fn ... >???</a>...</div>
>>
>> ?? can log in today and, without any cooperation from Facebook, append
>> a U+200B (zero-width space [1]) to his first name (regardless of the
>> input taking the form of one or two boxes), and immediately reap the
>> benefits of such an `n` optimization without negatively affecting UI,
>> sort order, etc.
>
> I don't speak Japanese, but I think ?? is the family name and ? is the given
> name. By not doing anything the 'n' optimization will incorrectly guess that
> the family name is ??? and given name unknown. By inserting a zero-width
> space, it will instead incorrectly guess that ?? is the given name and ? is
> the family name. Either way it's incorrect.

??? is the Japanese name Miyano (??) Shu (?) (well, probably ? there
may be other readings for ?). As Philip correctly guesses, Miyano is
the family name, so inserting any form of space character would give
an incorrectly reversed name using implied ?n? optimisation.

While Tantek?s suggested workaround of using the declared language
would work on the Japanese Facebook site, the @lang changes based on
location. For example:
http://www.facebook.com/people/gong-ye-zhong/100000456401743
has the same content with <html lang="en">

In addition to the points Philip made about @lang often being wrong, a
lot of the time it isn?t even present (well in Japan anyhow). I did a
quick search on a popular Japanese surname (28 mil results in Google),
and only 6 of the first 10 results declared @lang:
http://microformats.org/wiki/hcard-issues-resolved#resolved_2010
As you can guess, it goes downhill from there.
(btw, thanks for your comments Tantek ? let me know if you want me to
open the separate issue)

Philip, the implied ?n? optimisation doesn?t work on single word
names; they would get implied ?nickname? optimisation instead.


On Tue, Jul 20, 2010 at 9:29 PM, Stephen Paul Weber
<singpolyma@singpolyma.net> wrote:
> Microformats data is not "hidden"

In general this is true for microdata too.

> One of the benefits of using the real semantics of the page, and not some ?hacked-in layer like microdata, is that it works well with existing tools and markup. ?CSS styling of microformats, for example, "just works" and I use it all the time. ?DOM access similarly works well.

?hacked-in?? It?s specced on w3.org and includes an API. Also, check
out the CSS 2.1 [attr] selector.


On Wed, Jul 21, 2010 at 4:55 AM, Angelo Gladding <angelo@gladding.name> wrote:
>  However, if optimizations
> ?can yield 80%+ positive results when viewed in aggregate I personally give
> ?a little bit of magic a big thumbs up.

I?m guessing this wasn?t the metric by which using datetimes in the
abbr design pattern was depreciated


On Tue, Jul 20, 2010 at 2:41 PM, Martin McEvoy <martin@weborganics.co.uk> wrote:
> Im sorry but you cannot express *microformats* in microdata if you do, its
> cute, but It isn't a microformat because microformats *only* use ?class
> names, and a few choice rel-values. ?If you move a microformat away from
> @class its no longer a microformat and shouldn't be described as such

I?m sorry, but I don?t think this is correct. You?re mixing the
technology with the goal (and forgetting VoteLinks and @profile ;-)

?Designed for humans first and machines second, microformats are a set
of simple, open data formats built upon existing and widely adopted
standards? ? Microformats wiki about page

?Microformats are more than simply a technology like CSS or XHTML?they
are an approach to solving the important problem of creating a rich
semantic markup? ? Microformats, John Allsopp, p6


peace - oli

From philipj at opera.com  Thu Jul 22 06:53:10 2010
From: philipj at opera.com (=?iso-8859-15?Q?Philip_J=E4genstedt?=)
Date: Thu Jul 22 08:08:35 2010
Subject: [uf-discuss] re: HTML5 support
In-Reply-To: <1279728282.1873.167.camel@csarven-laptop>
References: <AANLkTikpstw7Uq_qimMTpGGWvn9YkvM89UIlB9B_c6uL@mail.gmail.com>
	<AANLkTinES5z3rhy-IEzpx1-w1BWVMgAQUopneU1H5Qxw@mail.gmail.com>
	<D2281DAD-9139-4313-8438-199E88C215E6@randomchaos.com>
	<AANLkTinD13jF-tpgfyuyd_V_h8zgTJrrvHAmuWte7MUT@mail.gmail.com>
	<3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com>
	<op.vf20mun3sr6mfa@philip-pc>
	<5DC1788B-59A0-45AE-8E88-4CF257DA642C@randomchaos.com>
	<AANLkTinI_nAP7x0suraed0uC_YsIFKqnAQVxBB4ikx_I@mail.gmail.com>
	<35CD8166-D92D-4030-895A-141131120C4E@randomchaos.com>
	<op.vf6tbfx3sr6mfa@philip-pc> <20100721134608.GA1496@singpolyma-svelti>
	<op.vf66pib8atwj1d@philip-pc.gothenburg.osa>
	<1279728282.1873.167.camel@csarven-laptop>
Message-ID: <op.vf8ziwjlatwj1d@philip-pc.gothenburg.osa>

On Wed, 21 Jul 2010 18:04:42 +0200, Sarven Capadisli <info@csarven.ca>  
wrote:

> On Wed, 2010-07-21 at 16:33 +0200, Philip J?genstedt wrote:
>> On Wed, 21 Jul 2010 15:46:08 +0200, Stephen Paul Weber
>> <singpolyma@singpolyma.net> wrote:
>>
>> > -----BEGIN PGP SIGNED MESSAGE-----
>> > Hash: SHA256
>> >
>> > Somebody claiming to be Philip J?genstedt wrote:
>> >> On Tue, 20 Jul 2010 15:47:19 +0200, Scott Reynen
>> >> <scott@randomchaos.com> wrote:
>> >>
>> >> >Distributed vocabulary development requires a general purpose
>> >> >solution.  Microformats don't have that requirement, so
>> >> >vocabulary-specific solutions are common.
>> >>
>> >> Yes, which is why general purpose parsers cannot exist, and why
>> >> browser support is unlikely.
>> >
>> > I'm pretty sure Firefox already supports ?fs...
>>
>> Are you sure it's not a plugin? If not, I'd be very interested to see it
>> in action.
>>
>
> It has some support. See also resource://gre/modules/Microformats.js and
> https://developer.mozilla.org/en/Using_microformats
>
> Probably the best way to see it in action is via JetPack:
> https://jetpack.mozillalabs.com/

Thanks, that's pretty cool. However, I note that this is only loaded on  
demand. Looking for e.g. hcards on every page parsed is not quite the same  
thing, and is what you'd need to do to have a button similar to the orange  
"feed" button pop up for all pages where there's something to add to the  
address book or calendar.

-- 
Philip J?genstedt
Core Developer
Opera Software

From angelo at gladding.name  Thu Jul 22 11:32:33 2010
From: angelo at gladding.name (Angelo Gladding)
Date: Thu Jul 22 11:32:53 2010
Subject: [uf-discuss] re: HTML5 support
In-Reply-To: <op.vf8ziwjlatwj1d@philip-pc.gothenburg.osa>
References: <AANLkTikpstw7Uq_qimMTpGGWvn9YkvM89UIlB9B_c6uL@mail.gmail.com>
	<AANLkTinES5z3rhy-IEzpx1-w1BWVMgAQUopneU1H5Qxw@mail.gmail.com>
	<D2281DAD-9139-4313-8438-199E88C215E6@randomchaos.com>
	<AANLkTinD13jF-tpgfyuyd_V_h8zgTJrrvHAmuWte7MUT@mail.gmail.com>
	<3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com>
	<op.vf20mun3sr6mfa@philip-pc>
	<5DC1788B-59A0-45AE-8E88-4CF257DA642C@randomchaos.com>
	<AANLkTinI_nAP7x0suraed0uC_YsIFKqnAQVxBB4ikx_I@mail.gmail.com>
	<35CD8166-D92D-4030-895A-141131120C4E@randomchaos.com>
	<op.vf6tbfx3sr6mfa@philip-pc>
	<20100721134608.GA1496@singpolyma-svelti>
	<op.vf66pib8atwj1d@philip-pc.gothenburg.osa>
	<1279728282.1873.167.camel@csarven-laptop>
	<op.vf8ziwjlatwj1d@philip-pc.gothenburg.osa>
Message-ID: <AANLkTinHoGWNvc4Xcea30i1tMMZbaRoQalIFg6Q2WV8S@mail.gmail.com>

On Thu, Jul 22, 2010 at 6:53 AM, Philip J?genstedt <philipj@opera.com> wrote:
> On Wed, 21 Jul 2010 18:04:42 +0200, Sarven Capadisli <info@csarven.ca>
> wrote:
>
>> On Wed, 2010-07-21 at 16:33 +0200, Philip J?genstedt wrote:
>>>
>>> On Wed, 21 Jul 2010 15:46:08 +0200, Stephen Paul Weber
>>> <singpolyma@singpolyma.net> wrote:
>>>
>>> > -----BEGIN PGP SIGNED MESSAGE-----
>>> > Hash: SHA256
>>> >
>>> > Somebody claiming to be Philip J?genstedt wrote:
>>> >> On Tue, 20 Jul 2010 15:47:19 +0200, Scott Reynen
>>> >> <scott@randomchaos.com> wrote:
>>> >>
>>> >> >Distributed vocabulary development requires a general purpose
>>> >> >solution. ?Microformats don't have that requirement, so
>>> >> >vocabulary-specific solutions are common.
>>> >>
>>> >> Yes, which is why general purpose parsers cannot exist, and why
>>> >> browser support is unlikely.
>>> >
>>> > I'm pretty sure Firefox already supports ?fs...
>>>
>>> Are you sure it's not a plugin? If not, I'd be very interested to see it
>>> in action.
>>>
>>
>> It has some support. See also resource://gre/modules/Microformats.js and
>> https://developer.mozilla.org/en/Using_microformats
>>
>> Probably the best way to see it in action is via JetPack:
>> https://jetpack.mozillalabs.com/
>
> Thanks, that's pretty cool. However, I note that this is only loaded on
> demand. Looking for e.g. hcards on every page parsed is not quite the same
> thing, and is what you'd need to do to have a button similar to the orange
> "feed" button pop up for all pages where there's something to add to the
> address book or calendar.
>

Firefox's Operator Plugin [1] has sniffed the microformats of each and every
document that I have opened on multiple computers (ranging from slow to fast)
for several years now. Make sure to install appropriate user scripts [2].

[1]: https://addons.mozilla.org/en-US/firefox/addon/4106/
[2]: http://kaply.com/weblog/operator-user-scripts/

-- 
Angelo Gladding
angelo@gladding.name

From angelo at gladding.name  Thu Jul 22 12:15:48 2010
From: angelo at gladding.name (Angelo Gladding)
Date: Thu Jul 22 12:15:56 2010
Subject: [uf-discuss] re: HTML5 support
In-Reply-To: <20100721140706.GB1496@singpolyma-svelti>
References: <D2281DAD-9139-4313-8438-199E88C215E6@randomchaos.com>
	<AANLkTinD13jF-tpgfyuyd_V_h8zgTJrrvHAmuWte7MUT@mail.gmail.com>
	<3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com>
	<op.vf20mun3sr6mfa@philip-pc>
	<5DC1788B-59A0-45AE-8E88-4CF257DA642C@randomchaos.com>
	<AANLkTinI_nAP7x0suraed0uC_YsIFKqnAQVxBB4ikx_I@mail.gmail.com>
	<AANLkTik1zqRB5suyS91V9ik_rvaEFi0pKiBSQHpFkw-i@mail.gmail.com>
	<op.vf40j1b6sr6mfa@philip-pc>
	<1279628988.17280.2.camel@singpolyma-N900>
	<20100721100922.1521c725@miranda.g5n.co.uk>
	<20100721140706.GB1496@singpolyma-svelti>
Message-ID: <AANLkTinbGkDolF7MkTkq4nT1sxoal1xfzjGqMqlI4PZN@mail.gmail.com>

On Wed, Jul 21, 2010 at 7:07 AM, Stephen Paul Weber
<singpolyma@singpolyma.net> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> Somebody claiming to be Toby Inkster wrote:
>> On Tue, 20 Jul 2010 08:29:48 -0400
>> Stephen Paul Weber <singpolyma@singpolyma.net> wrote:
>>
>> > Having written significant code both in-browser and out to parse
>> > microformats, I find the claim that parsing them using the DOM is
>> > "not practical" shocking. ?What would you prefer?
>>
>> Parsing microformats via the DOM is not practical. Parsing them any
>> other way is even worse though.
>>
>> While writing DOM code to parse a particular site's implementation of
>> say, hCard, is pretty trivial, generalising that to support all the
>> variations of how hCard is marked up in the wild is a lot of work.
>>
>> As a comparison, I have written Perl parsers[*] for microformats, RDFa
>> and Microdata. Here are the lines-of-code counts for each, excluding
>> documentation, comments and blank lines:
>>
>> The amount of code needed to parse microformats is clearly different
>> from the other formats.
>
> Sure, but you're comparing apples and oranges. ?RDF and microdata are more
> like JSON and XML: popular but useless by themselves. ?They're just generic
> containers. ?So, yes, you can trivially parse out the KVPs they encode, but
> you have no idea what those are, what they mean, what the relationships
> between them are, nothing. ?So you would have to write more code to
> implement each specific vocabulary you were interested in, and do useful
> stuff with it. ?The microformats parsers, because they're parsing an actual
> vocabulary instead of a container format, yes there will be some more code,
> because both steps are happening at once.
>
> The data you get out is actually the data you want, that makes sense, though.
> When I want profile data, I write an hCard parser and grab it. ?The same
> deal with microdata would normally be done with a seperate "generic" parser
> and then the code to throw out all vocabularies I don't want, and then the
> one to massage into an internal data format that I want the vocabularies
> that I do.

On Wed, Jul 21, 2010 at 2:09 AM, Toby Inkster <mail@tobyinkster.co.uk> wrote:
> Microdata      :  945
> RDFa 1.0       : 1265
> RDFa 1.1 [**]  : 2611
> microformats   : 9455

It's tough to argue with an order of magnitude difference with
the most complete, public universal implementation to date.

So what is the fundamental difference between the two approaches?

It appears that Microdata takes us through lexical analysis and leaves us
with a parse tree (?) while Microformats take us through the secondary stage
of syntactic/semantic analysis and leaves us with a semantic graph (?).

Does Microdoata check syntax as well? If so, how does it know what syntax
to look for without sniffing the vocabulary specification? e.g. How does the
parser know to store http://microformats.org/wiki/hcard#bday as a datetime?

- - -

On a related note, how many of our issues does MF2 [1] stand to resolve?
Reading these notes has green-lighted a couple of features I was tentatively
considering for my universal parser. Future proofing my implementation (and
participating in this conversation!) has helped me to better understand the
two approaches' design goals. MF2 looks to be the logical middle-ground
and may very well render much of this conversation moot.

[1]: http://microformats.org/wiki/events/2010-05-02-microformats-2-0

-- 
Angelo Gladding
angelo@gladding.name

From angelo at gladding.name  Thu Jul 22 12:51:45 2010
From: angelo at gladding.name (Angelo Gladding)
Date: Thu Jul 22 12:57:20 2010
Subject: [uf-discuss] re: HTML5 support
In-Reply-To: <AANLkTilXL65fsNcOAdMDRDLe_MjJnb2qTWIZxTwrqAna@mail.gmail.com>
References: <AANLkTinOxJu5JIgIAsUGi81F9juAYWBzQPid8EhGQx7K@mail.gmail.com>
	<1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry>
	<AANLkTikpstw7Uq_qimMTpGGWvn9YkvM89UIlB9B_c6uL@mail.gmail.com>
	<AANLkTinES5z3rhy-IEzpx1-w1BWVMgAQUopneU1H5Qxw@mail.gmail.com>
	<D2281DAD-9139-4313-8438-199E88C215E6@randomchaos.com>
	<AANLkTinD13jF-tpgfyuyd_V_h8zgTJrrvHAmuWte7MUT@mail.gmail.com>
	<3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com>
	<op.vf20mun3sr6mfa@philip-pc>
	<5DC1788B-59A0-45AE-8E88-4CF257DA642C@randomchaos.com>
	<AANLkTinI_nAP7x0suraed0uC_YsIFKqnAQVxBB4ikx_I@mail.gmail.com>
	<AANLkTik1zqRB5suyS91V9ik_rvaEFi0pKiBSQHpFkw-i@mail.gmail.com>
	<op.vf40j1b6sr6mfa@philip-pc>
	<AANLkTilntXuAgco5NBeMtXhfmJqx7ZSF8_LSOLIEZ3o6@mail.gmail.com>
	<op.vf6skil5sr6mfa@philip-pc>
	<AANLkTilXL65fsNcOAdMDRDLe_MjJnb2qTWIZxTwrqAna@mail.gmail.com>
Message-ID: <AANLkTikeZDEcEETkZDN68_ufjSLusllJbwAhcWMNDuLS@mail.gmail.com>

On Thu, Jul 22, 2010 at 7:20 AM, Oli Studholme
<microformats.org@boblet.net> wrote:
> ??? is the Japanese name Miyano (??) Shu (?) (well, probably ? there
> may be other readings for ?). As Philip correctly guesses, Miyano is
> the family name, so inserting any form of space character would give
> an incorrectly reversed name using implied ?n? optimisation.

My original intentions were to fall back on @lang in case sniffing
Unicode ranges couldn't
handle all of the cases. However, if that were the case, would it too
be sufficiently magic?

As I mentioned to Philip above, I'll draft the algorithm and post it
back to be more clear.

-- 
Angelo Gladding
angelo@gladding.name

From microformats.org at boblet.net  Thu Jul 22 18:21:06 2010
From: microformats.org at boblet.net (Oli Studholme)
Date: Thu Jul 22 18:21:32 2010
Subject: [uf-discuss] re: HTML5 support
In-Reply-To: <AANLkTikeZDEcEETkZDN68_ufjSLusllJbwAhcWMNDuLS@mail.gmail.com>
References: <AANLkTinOxJu5JIgIAsUGi81F9juAYWBzQPid8EhGQx7K@mail.gmail.com> 
	<1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry>
	<AANLkTikpstw7Uq_qimMTpGGWvn9YkvM89UIlB9B_c6uL@mail.gmail.com> 
	<AANLkTinES5z3rhy-IEzpx1-w1BWVMgAQUopneU1H5Qxw@mail.gmail.com> 
	<D2281DAD-9139-4313-8438-199E88C215E6@randomchaos.com>
	<AANLkTinD13jF-tpgfyuyd_V_h8zgTJrrvHAmuWte7MUT@mail.gmail.com> 
	<3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com>
	<op.vf20mun3sr6mfa@philip-pc> 
	<5DC1788B-59A0-45AE-8E88-4CF257DA642C@randomchaos.com>
	<AANLkTinI_nAP7x0suraed0uC_YsIFKqnAQVxBB4ikx_I@mail.gmail.com> 
	<AANLkTik1zqRB5suyS91V9ik_rvaEFi0pKiBSQHpFkw-i@mail.gmail.com> 
	<op.vf40j1b6sr6mfa@philip-pc>
	<AANLkTilntXuAgco5NBeMtXhfmJqx7ZSF8_LSOLIEZ3o6@mail.gmail.com> 
	<op.vf6skil5sr6mfa@philip-pc>
	<AANLkTilXL65fsNcOAdMDRDLe_MjJnb2qTWIZxTwrqAna@mail.gmail.com> 
	<AANLkTikeZDEcEETkZDN68_ufjSLusllJbwAhcWMNDuLS@mail.gmail.com>
Message-ID: <AANLkTikILYn2I9xDqEVeB2TE_mvvnhzOqQa4kFcFAGs6@mail.gmail.com>

Hey Angelo,

On Fri, Jul 23, 2010 at 4:51 AM, Angelo Gladding <angelo@gladding.name> wrote:
> On Thu, Jul 22, 2010 at 7:20 AM, Oli Studholme
> <microformats.org@boblet.net> wrote:
>> ??? is the Japanese name Miyano (??) Shu (?) (well, probably ? there
>> may be other readings for ?). As Philip correctly guesses, Miyano is
>> the family name, so inserting any form of space character would give
>> an incorrectly reversed name using implied ?n? optimisation.
>
> My original intentions were to fall back on @lang in case sniffing
> Unicode ranges couldn't
> handle all of the cases. However, if that were the case, would it too
> be sufficiently magic?
>
> As I mentioned to Philip above, I'll draft the algorithm and post it
> back to be more clear.

I think the magic part is less of a problem than the magic sometimes
not working part. You?ll also need to convert to Unicode for pages in
other encodings (three others used in Japan), while keeping in mind
encodings are sometimes not declared.

If you need any help for Japanese let me know

peace - oli

PS speaking of encodings I recently saw a Japanese page using two
different encodings (second via iframe), neither of which were
declared. Mojibake disaster! :O
From scott at randomchaos.com  Thu Jul 22 21:41:41 2010
From: scott at randomchaos.com (Scott Reynen)
Date: Thu Jul 22 21:41:48 2010
Subject: [uf-discuss] n optimization internationalization (Was: HTML5
	support)
In-Reply-To: <AANLkTikeZDEcEETkZDN68_ufjSLusllJbwAhcWMNDuLS@mail.gmail.com>
References: <AANLkTinOxJu5JIgIAsUGi81F9juAYWBzQPid8EhGQx7K@mail.gmail.com>
	<1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry>
	<AANLkTikpstw7Uq_qimMTpGGWvn9YkvM89UIlB9B_c6uL@mail.gmail.com>
	<AANLkTinES5z3rhy-IEzpx1-w1BWVMgAQUopneU1H5Qxw@mail.gmail.com>
	<D2281DAD-9139-4313-8438-199E88C215E6@randomchaos.com>
	<AANLkTinD13jF-tpgfyuyd_V_h8zgTJrrvHAmuWte7MUT@mail.gmail.com>
	<3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com>
	<op.vf20mun3sr6mfa@philip-pc>
	<5DC1788B-59A0-45AE-8E88-4CF257DA642C@randomchaos.com>
	<AANLkTinI_nAP7x0suraed0uC_YsIFKqnAQVxBB4ikx_I@mail.gmail.com>
	<AANLkTik1zqRB5suyS91V9ik_rvaEFi0pKiBSQHpFkw-i@mail.gmail.com>
	<op.vf40j1b6sr6mfa@philip-pc>
	<AANLkTilntXuAgco5NBeMtXhfmJqx7ZSF8_LSOLIEZ3o6@mail.gmail.com>
	<op.vf6skil5sr6mfa@philip-pc>
	<AANLkTilXL65fsNcOAdMDRDLe_MjJnb2qTWIZxTwrqAna@mail.gmail.com>
	<AANLkTikeZDEcEETkZDN68_ufjSLusllJbwAhcWMNDuLS@mail.gmail.com>
Message-ID: <E9CE9713-0894-4582-BA7C-D62D85B3FFC2@randomchaos.com>

On Jul 22, 2010, at 1:51 PM, Angelo Gladding wrote:

> On Thu, Jul 22, 2010 at 7:20 AM, Oli Studholme
> <microformats.org@boblet.net> wrote:
>> ??? is the Japanese name Miyano (??) Shu (?) (well, probably ? there
>> may be other readings for ?). As Philip correctly guesses, Miyano is
>> the family name, so inserting any form of space character would give
>> an incorrectly reversed name using implied ?n? optimisation.
> 
> My original intentions were to fall back on @lang in case sniffing
> Unicode ranges couldn't
> handle all of the cases. However, if that were the case, would it too
> be sufficiently magic?
> 
> As I mentioned to Philip above, I'll draft the algorithm and post it
> back to be more clear.

I don't believe any algorithm can reliably predict how n optimization should be applied, so it should be used sparingly (only when name order is known) even with increased consideration of non-English names.

I know plenty of Japanese people who, at least when they're interacting primarily with English speakers, write their name given name first (e.g. Shu Miyano), just as most English speakers do.  Sometimes they even do this when writing their names in Japanese.  A couple examples:

http://en.wikipedia.org/wiki/Yoko_Ono
http://en.wikipedia.org/wiki/Joi_Ito

Note that both names are printed both ways, given name first and family name first.  Although they can be useful for making better guessing, neither language nor unicode ranges can reliably tell us which name is given and which is family.

Peace,
Scott


From philipj at opera.com  Fri Jul 23 07:34:04 2010
From: philipj at opera.com (=?utf-8?Q?Philip_J=C3=A4genstedt?=)
Date: Fri Jul 23 07:34:28 2010
Subject: [uf-discuss] re: HTML5 support
In-Reply-To: <AANLkTinbGkDolF7MkTkq4nT1sxoal1xfzjGqMqlI4PZN@mail.gmail.com>
References: <D2281DAD-9139-4313-8438-199E88C215E6@randomchaos.com>
	<AANLkTinD13jF-tpgfyuyd_V_h8zgTJrrvHAmuWte7MUT@mail.gmail.com>
	<3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com>
	<op.vf20mun3sr6mfa@philip-pc>
	<5DC1788B-59A0-45AE-8E88-4CF257DA642C@randomchaos.com>
	<AANLkTinI_nAP7x0suraed0uC_YsIFKqnAQVxBB4ikx_I@mail.gmail.com>
	<AANLkTik1zqRB5suyS91V9ik_rvaEFi0pKiBSQHpFkw-i@mail.gmail.com>
	<op.vf40j1b6sr6mfa@philip-pc>
	<1279628988.17280.2.camel@singpolyma-N900>
	<20100721100922.1521c725@miranda.g5n.co.uk>
	<20100721140706.GB1496@singpolyma-svelti>
	<AANLkTinbGkDolF7MkTkq4nT1sxoal1xfzjGqMqlI4PZN@mail.gmail.com>
Message-ID: <op.vgav22b9sr6mfa@philip-pc.gothenburg.osa>

On Thu, 22 Jul 2010 21:15:48 +0200, Angelo Gladding <angelo@gladding.name>  
wrote:

> Does Microdoata check syntax as well? If so, how does it know what syntax
> to look for without sniffing the vocabulary specification? e.g. How does  
> the
> parser know to store http://microformats.org/wiki/hcard#bday as a  
> datetime?

No, there's no checking of the vocabulary-specific rules. When it comes to  
dates, those are expressed using <time>, so you can validate those by  
simply using an HTML validator. The general constraints of microdata can  
also be checked by an HTML validator, but I don't know how far along  
validator.nu is in its support yet.

-- 
Philip J?genstedt
Core Developer
Opera Software

From send.missive at coreymwamba.co.uk  Fri Jul 23 21:04:50 2010
From: send.missive at coreymwamba.co.uk (Corey Mwamba)
Date: Fri Jul 23 21:05:00 2010
Subject: [uf-discuss] Marking up radio stations
Message-ID: <201007240504.50934.send.missive@coreymwamba.co.uk>

Hello,

I was wondering if anyone had any suggestions as to how to mark up a radio 
station using microformats, especially in relation to the frequencies - which 
I see as a type of address! Any thoughts?

Thanks,

C.
----
http://www.coreymwamba.co.uk
http://trio.coreymwamba.co.uk/

music = science + magic
From mail at tobyinkster.co.uk  Sat Jul 24 03:23:30 2010
From: mail at tobyinkster.co.uk (Toby Inkster)
Date: Sat Jul 24 03:24:01 2010
Subject: [uf-discuss] Marking up radio stations
In-Reply-To: <201007240504.50934.send.missive@coreymwamba.co.uk>
References: <201007240504.50934.send.missive@coreymwamba.co.uk>
Message-ID: <20100724112330.4f63ef1b@miranda.g5n.co.uk>

On Sat, 24 Jul 2010 05:04:50 +0100
Corey Mwamba <send.missive@coreymwamba.co.uk> wrote:

> I was wondering if anyone had any suggestions as to how to mark up a
> radio station using microformats, especially in relation to the
> frequencies - which I see as a type of address! Any thoughts?

Interesting question. hCard is probably a good start:

	<div class="vcard">
		<b class="fn org">Heart FM?(Sussex)</b>
		<i>102.4 MHz</i>
	</div>

Now, how to encode the frequency? It is an address of sorts, or at
least a locator. Not the kind of address that is suitable for marking
up with class="adr" though. If there were a URI scheme for radio wave
frequencies this would be a little easier:

	<div class="vcard">
		<b class="fn org">Heart FM?(Sussex)</b>
		<a href="radio:fm:102400000"
	           class="url" >102.4 MHz</a>
	</div>

Radio stations are very geography-specific. 50 miles away a completely
different organisation could be broadcasting on the same frequency. So
our hypothetical "radio:" URI scheme would probably need a geographic
signifier to be attached:

	<div class="vcard">
		<b class="fn org">Heart FM?(Sussex)</b>
		<a href="radio:fm:102400000;context=geo:50.9761,0.2293"
		   class="url">102.4 MHz</a>
	</div>
	
However, such a URI scheme does not exist. It could be registered with
IANA, or you could bypass that requirement by using a specialised HTTP
prefix instead, a la <http://dbooth.org/2006/urn2http/>.

Short of specialised URIs to identify radio signals, the most
appropriate construct in hCard would probably be class="note". e.g.:

	<div class="vcard">
		<b class="fn org">Heart FM?(Sussex)</b>
		<i class="note">
			102.4 MHz
			<abbr title="50.9761;0.2293"
			      class="geo">(Eastbourne)</abbr>
		</i>
	</div>

-- 
Toby A Inkster
<mailto:mail@tobyinkster.co.uk>
<http://tobyinkster.co.uk>


From send.missive at coreymwamba.co.uk  Sat Jul 24 22:42:40 2010
From: send.missive at coreymwamba.co.uk (Corey Mwamba)
Date: Sat Jul 24 23:04:47 2010
Subject: [uf-discuss] Marking up radio stations
In-Reply-To: <20100724112330.4f63ef1b@miranda.g5n.co.uk>
References: <201007240504.50934.send.missive@coreymwamba.co.uk>
	<20100724112330.4f63ef1b@miranda.g5n.co.uk>
Message-ID: <201007250642.41449.send.missive@coreymwamba.co.uk>

> Interesting question. hCard is probably a good start:
>
> 	<div class="vcard">
> 		<b class="fn org">Heart FM (Sussex)</b>
> 		<i>102.4 MHz</i>
> 	</div>
>

That's what I thought too. That's what I'm doing at the moment. The frequency 
I see as a form of location. Except on the electromagnetic spectrum.


> Now, how to encode the frequency? It is an address of sorts, or at
> least a locator. Not the kind of address that is suitable for marking
> up with class="adr" though. If there were a URI scheme for radio wave
> frequencies this would be a little easier:
>
> 	<div class="vcard">
> 		<b class="fn org">Heart FM (Sussex)</b>
> 		<a href="radio:fm:102400000"
> 	           class="url" >102.4 MHz</a>
> 	</div>
>
> Radio stations are very geography-specific. 50 miles away a completely
> different organisation could be broadcasting on the same frequency. So
> our hypothetical "radio:" URI scheme would probably need a geographic
> signifier to be attached:
>
> 	<div class="vcard">
> 		<b class="fn org">Heart FM (Sussex)</b>
> 		<a href="radio:fm:102400000;context=geo:50.9761,0.2293"
> 		   class="url">102.4 MHz</a>
> 	</div>
>
> However, such a URI scheme does not exist. It could be registered with
> IANA, or you could bypass that requirement by using a specialised HTTP
> prefix instead, a la <http://dbooth.org/2006/urn2http/>.
>
> Short of specialised URIs to identify radio signals, the most
> appropriate construct in hCard would probably be class="note". e.g.:
>
> 	<div class="vcard">
> 		<b class="fn org">Heart FM (Sussex)</b>
> 		<i class="note">
> 			102.4 MHz
> 			<abbr title="50.9761;0.2293"
> 			      class="geo">(Eastbourne)</abbr>
> 		</i>
> 	</div>

I read Andy's post and went looking for the hMeasure draft which looked 
promising - but then ran across 

http://microformats.org/wiki/broadcast-examples

which would define exactly what I need [and has the class name "frequency"]. 
However it is worth noting that some radio stations are not placed at one 
frequency, but a range: however the idea doesn't deal with that as it stands.

So, bearing in mind my bias towards frequency being a location [albeit a fuzzy 
one], I'm thinking on the lines of

<div class="vcard">
<strong class="fn org">BBC Radio 3</strong>
<span class="role">radio station</span>
<em class="frequency">
<span class="low">90</span>  - <span class="high">92</span>
<span class="band">FM</span>
</em>
</div>

I'm ignoring the Hertz units because:
1. As far as I can recall, no one has ever mentioned them while speaking; and
2. the band [FM/UKV, AM, SW, LW, MW] is more important for physically finding 
the station on a radio.

If you DID want to use the units [which from a scientific point of view would 
be correct] then this could be like this:

<div class="vcard">
<b class="fn org">Heart FM (Sussex)</b>
<i class="frequency">102.4
<abbr class="unit" title="Megahertz">MHz</abbr>
</i>
(<abbr title="50.9761;0.2293" class="geo">Eastbourne</abbr>)
</div>


I do agree that stations are very geography specific, though. But to my 
thinking, the geo block does not need to be inside the frequency class if the 
information is contained in a hCard [although it could be].

So for the Sussex Heart FM example, it'd look like this:

<div class="vcard">
<b class="fn org">Heart FM (Sussex)</b>
<i class="frequency">102.4<span class="band">FM</span></i>
<!-- and then you could move the latitude/longitude out into its own section 
if you like -->
(<abbr title="50.9761;0.2293" class="geo">Eastbourne</abbr>)
</div>

Or more formally:

<div class="vcard">
<b class="fn org">Heart Radio in Sussex</b>
<i class="frequency">
<span class="low">102.4</span>  and 
<span class="high">103.5</span>
<span class="band">FM</span>
(<abbr title="50.9761;0.2293" class="geo">Eastbourne</abbr>)
</i>
</div>

What do you think?

C. 


----
http://www.coreymwamba.co.uk
http://trio.coreymwamba.co.uk/

music = science + magic


From send.missive at coreymwamba.co.uk  Sun Jul 25 23:08:17 2010
From: send.missive at coreymwamba.co.uk (Corey Mwamba)
Date: Sun Jul 25 23:08:28 2010
Subject: [uf-discuss] Marking up radio stations
In-Reply-To: <20100724112330.4f63ef1b@miranda.g5n.co.uk>
References: <201007240504.50934.send.missive@coreymwamba.co.uk>
	<20100724112330.4f63ef1b@miranda.g5n.co.uk>
Message-ID: <201007260708.17900.send.missive@coreymwamba.co.uk>

Hello again,

I had not realised that Andy Mabbett didn't post a reply here - so here's a 
reference to his blog post on this:

http://pigsonthewing.wordpress.com/2010/07/24/measurement-microformat-for-radio-station-frequencies

There's discussion in the comments. Having had some sleep, I'm feeling more 
that "frequency" should be able to act as either

1. a property, as proposed in the broadcast examples page:
http://microformats.org/wiki/broadcast-examples

OR

2. a sub-property of "adr".

Am I correct in thinking that this would be similar to how "tel" or "geo" 
work?

If this were to happen I think "frequency" should have additional 
sub-properties:
* band [I think this should be required] - FM, SW, LW, MW, et al.

It should be enough to just write 

<div class="vcard">
<strong class="fn org">R?dio Nacional do Alto Solim?es</strong>
<div class="adr">
<!-- you could of course use geo instead of adr -->
<span class="locality">Amazonas</span>
<span class="country-name">Brazil</span>
</div>
<div class="frequency">
<span class="band">FM</span>: 96.1
</div>
<div class="frequency">
<span class="band">AM</span>: 670 
</div>
</div>

But as quite a few radio stations are located over a range of frequencies, 
you'd need 

* low - the lowest frequency at which the station can be heard
* high - the highest frequency at which the station can be heard

This is from my gigs page for Radio 3 and shows the range:

<div class="location vcard">
<strong class="fn org">
<a href="http://www.bbc.co.uk/radio3/jazzon3" class="url">BBC Radio 3</a>
</strong>
<span class="frequency" style="display:block;">
<span class="low">90.2</span> - <span class="high">92.6</span>
<abbr class="band" title="Frequency Modulation broadcast">FM</abbr>: 
</span>
<span class="adr">
<span class="country-name">UK</span>
</span>
</div>

and you could use value for discrete frequencies.

<div class="vcard">
<b class="fn org">Heart Radio in Sussex</b>
<i class="frequency">
<span class="value">102.4</span>  and 
<span class="value">103.5</span>
<span class="band">FM</span>
</i>
(<abbr title="50.9761;0.2293" class="geo">Eastbourne</abbr>)
</div>

If you wanted to or needed to refer to units [note that when using a radio, 
most people will not need the units as the reception range for each band is 
already set] then you could use hMeasure;

<div class="vcard">
<strong class="fn org">R?dio Nacional do Alto Solim?es</strong>
<div class="adr">
<span class="locality">Amazonas</span>
<span class="country-name">Brazil</span>
</div>
<div class="frequency">
<span class="band">FM</span> 
<span class="measure">
<span class="num">96.1</span>
<abbr class="unit" title="megaHertz">MHz</abbr>
</span>
</div>
<div class="frequency">
<span class="band">AM</span> 
<span class="measure">
<span class="num">670</span>
<abbr class="unit" title="kiloHertz">kHz</abbr> 
</span>
</div>
</div>

But I feel this is easier to read and type:

<div class="vcard">
<strong class="fn org">R?dio Nacional do Alto Solim?es</strong>
<div class="adr">
<span class="locality">Amazonas</span>
<span class="country-name">Brazil</span>
</div>
<div class="frequency">
<span class="band">FM</span>: 96.1MHz
</div>
<div class="frequency">
<span class="band">AM</span>: 670kHz
</div>
</div>

Shortwave radio reception can be dependent on the time of day: should/can this 
be handled with hCalendar?

For example, looking at 

http://www.radioaustralia.net.au/waystolisten/australia.htm

The morning frequencies might be marked up as: 

<div class="vevent">
<!-- here, I'm assuming that the person lives in Western Australia. This is a 
country that can have as many as six timezones. -->
<strong class="summary">Morning</strong>
<span class="dtstart">9:00 
<span class="value">+08</span>
</span>
<span class="dtend">12:00</span>
<span class="value">+08</span>
</span>
<div class="frequency">
<span class="band">SW</span>: 
<span class="value">9660</span>, 
<span class="value">15230</span>, 
<span class="value">15240</span>, 
<span class="value">21725</span>
</div>
</div>

Any thoughts?

C.

----
http://www.coreymwamba.co.uk
http://trio.coreymwamba.co.uk/

music = science + magic

From send.missive at coreymwamba.co.uk  Sun Jul 25 23:20:31 2010
From: send.missive at coreymwamba.co.uk (Corey Mwamba)
Date: Sun Jul 25 23:20:43 2010
Subject: [uf-discuss] Marking up radio stations
In-Reply-To: <20100724112330.4f63ef1b@miranda.g5n.co.uk>
References: <201007240504.50934.send.missive@coreymwamba.co.uk>
	<20100724112330.4f63ef1b@miranda.g5n.co.uk>
Message-ID: <201007260720.31864.send.missive@coreymwamba.co.uk>

Hello,

I suffered from span-itis in my last mock-up. And it might be easier to read 
with some spacing. Here it is corrected.

<div class="vevent">
   <strong class="summary">Morning</strong>
   <span class="dtstart">9:00<span class="value">+08</span></span>
   <span class="dtend">12:00<span class="value">+08</span></span>
   <div class="frequency">
       <span class="band">SW</span>: 
       <span class="value">9660</span>, 
       <span class="value">15230</span>, 
       <span class="value">15240</span>, 
       <span class="value">21725</span>
   </div>
</div>

----
http://www.coreymwamba.co.uk
http://trio.coreymwamba.co.uk/

From glenn.jones at madgex.com  Mon Jul 26 02:53:31 2010
From: glenn.jones at madgex.com (Glenn Jones)
Date: Mon Jul 26 02:54:22 2010
Subject: [uf-discuss] UfXtract .Net microformats parser open-sourced
Message-ID: <36A319113CF910438942741C4727ADFF04A486AF@MOBY.Clarence.local>

Hi All

I have just open-sourced UfXtract .Net microformats parser.  With a few
lines of code you can load and parse microformats from Urls or HTML
strings.  You can then extract the data directly in .Net or convert it
into JSON, JSON-P or XML. 

UfXtract currently supports the following microformats hCard, hCalendar,
hReview, hResume, hAtom, XFN, rel-tag, geo, adr, rel-nofollow,
rel-license, rel-directory, rel-home, rel-enclosure, rel-payment and
votelinks.

It also supports a handful of POSH patterns hCard-XFN, rel-me,
rel-next/previous, test-suite and test-fixture. The support of rel-me
and rel-next/previous was added to help people build social graph
spiders.

UfXtract can typically parse a page between 10-50ms. I have gone to some
pains to build a test suite to make sure it conforms as closely as
possible to the microformats specs. 

You can also easily create new microformats and POSH definitions using
some simple .Net objects.

API - http://ufxtract.com/
Documentation - http://ufxtract.com/documentation/
Source code - http://github.com/glennjones/ufxtract/
Test suite - http://www.ufxtract.com/testsuite/

Hopefully people will find it useful...

Glenn Jones 


From axelm at nona.net  Mon Jul 26 04:44:45 2010
From: axelm at nona.net (Alex Mayrhofer)
Date: Mon Jul 26 04:45:09 2010
Subject: [uf-discuss] Marking up radio stations
In-Reply-To: <201007250642.41449.send.missive@coreymwamba.co.uk>
References: <201007240504.50934.send.missive@coreymwamba.co.uk>	<20100724112330.4f63ef1b@miranda.g5n.co.uk>
	<201007250642.41449.send.missive@coreymwamba.co.uk>
Message-ID: <4C4D752D.8050804@nona.net>


On 25.07.2010 07:42, Corey Mwamba wrote:
>> 	<div class="vcard">
>> 		<b class="fn org">Heart FM (Sussex)</b>
>> 		<a href="radio:fm:102400000;context=geo:50.9761,0.2293"
>> 		   class="url">102.4 MHz</a>
>> 	</div>
>>
>> However, such a URI scheme does not exist. It could be registered with
>> IANA, or you could bypass that requirement by using a specialised HTTP
>> prefix instead, a la<http://dbooth.org/2006/urn2http/>.

side note:

A URI scheme for "geo" does now exists - it is standardized in RFC 5870, 
so the following example:

>> Short of specialised URIs to identify radio signals, the most
>> appropriate construct in hCard would probably be class="note". e.g.:
>>
>> 	<div class="vcard">
>> 		<b class="fn org">Heart FM (Sussex)</b>
>> 		<i class="note">
>> 			102.4 MHz
>> 			<abbr title="50.9761;0.2293"
>> 			      class="geo">(Eastbourne)</abbr>
>> 		</i>
>> 	</div>

... could also include a

      <i class='note'> 102.4 MHz
          <a href='geo:50.9761;0.2293'>(Eastbourne)</a>
      </i>

(Disclaimer: I'm one of the authors of the "geo" URI scheme, so my view 
might be biased ;)

Alex
From lists at ben-ward.co.uk  Wed Jul 28 14:19:53 2010
From: lists at ben-ward.co.uk (lists@ben-ward.co.uk)
Date: Wed Jul 28 14:20:11 2010
Subject: [uf-discuss] UfXtract .Net microformats parser open-sourced
In-Reply-To: <36A319113CF910438942741C4727ADFF04A486AF@MOBY.Clarence.local>
References: <36A319113CF910438942741C4727ADFF04A486AF@MOBY.Clarence.local>
Message-ID: <1280351993.23241.1387235699@webmail.messagingengine.com>

This is fantastic news. Thank you, Glenn, for this and for the years of
work that have gone into UfXtract and its test suite so far.

Ben

On Mon, 26 Jul 2010 10:53 +0100, "Glenn Jones" <glenn.jones@madgex.com>
wrote:
> Hi All
> 
> I have just open-sourced UfXtract .Net microformats parser.  With a few
> lines of code you can load and parse microformats from Urls or HTML
> strings.  You can then extract the data directly in .Net or convert it
> into JSON, JSON-P or XML. 
> 
> UfXtract currently supports the following microformats hCard, hCalendar,
> hReview, hResume, hAtom, XFN, rel-tag, geo, adr, rel-nofollow,
> rel-license, rel-directory, rel-home, rel-enclosure, rel-payment and
> votelinks.
> 
> It also supports a handful of POSH patterns hCard-XFN, rel-me,
> rel-next/previous, test-suite and test-fixture. The support of rel-me
> and rel-next/previous was added to help people build social graph
> spiders.
> 
> UfXtract can typically parse a page between 10-50ms. I have gone to some
> pains to build a test suite to make sure it conforms as closely as
> possible to the microformats specs. 
> 
> You can also easily create new microformats and POSH definitions using
> some simple .Net objects.
> 
> API - http://ufxtract.com/
> Documentation - http://ufxtract.com/documentation/
> Source code - http://github.com/glennjones/ufxtract/
> Test suite - http://www.ufxtract.com/testsuite/
> 
> Hopefully people will find it useful...
> 
> Glenn Jones 
> 
> 
> _______________________________________________
> microformats-discuss mailing list
> microformats-discuss@microformats.org
> http://microformats.org/mailman/listinfo/microformats-discuss
> 
From andreluis.pt at gmail.com  Wed Jul 28 16:20:55 2010
From: andreluis.pt at gmail.com (=?ISO-8859-1?Q?Andr=E9_Lu=EDs?=)
Date: Wed Jul 28 16:21:00 2010
Subject: [uf-discuss] UfXtract .Net microformats parser open-sourced
In-Reply-To: <1280351993.23241.1387235699@webmail.messagingengine.com>
References: <36A319113CF910438942741C4727ADFF04A486AF@MOBY.Clarence.local>
	<1280351993.23241.1387235699@webmail.messagingengine.com>
Message-ID: <AANLkTi=6Mt-x9352jn+xLiuOYwMONWtEdmNEnTiMGeaK@mail.gmail.com>

Damn, forgot to reply to this! Thanks Ben. ;)

On 28 July 2010 22:19,  <lists@ben-ward.co.uk> wrote:
> This is fantastic news. Thank you, Glenn, for this and for the years of
> work that have gone into UfXtract and its test suite so far.

+1! Indeed. Thank you, Glenn. Even though .NET is not my thing, it's
valuable to have this sort of tools on any language.

>From the few tests I ran, it works pretty well.

One minor gripe, though... can't we ask for transformation for more
than one format at once? Optimus does this. And it's kinda useful to
avoid more than one requests per URI... :)

Anyway, I've passed it along to the guys at the company. I'll pass any
comments that pop up, eventually.

Cheers!
Andr? Lu?s
http://id.andr3.net


>
> Ben
>
> On Mon, 26 Jul 2010 10:53 +0100, "Glenn Jones" <glenn.jones@madgex.com>
> wrote:
>> Hi All
>>
>> I have just open-sourced UfXtract .Net microformats parser. ?With a few
>> lines of code you can load and parse microformats from Urls or HTML
>> strings. ?You can then extract the data directly in .Net or convert it
>> into JSON, JSON-P or XML.
>>
>> UfXtract currently supports the following microformats hCard, hCalendar,
>> hReview, hResume, hAtom, XFN, rel-tag, geo, adr, rel-nofollow,
>> rel-license, rel-directory, rel-home, rel-enclosure, rel-payment and
>> votelinks.
>>
>> It also supports a handful of POSH patterns hCard-XFN, rel-me,
>> rel-next/previous, test-suite and test-fixture. The support of rel-me
>> and rel-next/previous was added to help people build social graph
>> spiders.
>>
>> UfXtract can typically parse a page between 10-50ms. I have gone to some
>> pains to build a test suite to make sure it conforms as closely as
>> possible to the microformats specs.
>>
>> You can also easily create new microformats and POSH definitions using
>> some simple .Net objects.
>>
>> API - http://ufxtract.com/
>> Documentation - http://ufxtract.com/documentation/
>> Source code - http://github.com/glennjones/ufxtract/
>> Test suite - http://www.ufxtract.com/testsuite/
>>
>> Hopefully people will find it useful...
>>
>> Glenn Jones
>>
>>
>> _______________________________________________
>> microformats-discuss mailing list
>> microformats-discuss@microformats.org
>> http://microformats.org/mailman/listinfo/microformats-discuss
>>
> _______________________________________________
> microformats-discuss mailing list
> microformats-discuss@microformats.org
> http://microformats.org/mailman/listinfo/microformats-discuss
>

From glenn.jones at madgex.com  Fri Jul 30 08:49:56 2010
From: glenn.jones at madgex.com (Glenn Jones)
Date: Fri Jul 30 08:50:11 2010
Subject: [uf-discuss] UfXtract - more than one format at once
Message-ID: <36A319113CF910438942741C4727ADFF04A4935B@MOBY.Clarence.local>

Hi All

Andr? Lu?s wrote
> One minor gripe, though... can't we ask for transformation for more than one format at once? Optimus does this. And it's kinda useful to avoid more than one requests per URI... :)


OK I have added parsing more than one microformat to the UfXtract API. You have always been able to do this with the .Net library I just did not build it into API. The API format parameter can now takes either a single value or comma delimited list as the example below. 

http://ufxtract.com/api/default.aspx?url=http%3A%2F%2Fwww.glennjones.net%2Fabout%2F&htmlfragment=&orginurl=http%3A%2F%2F&format=hcard%2Cxfn%2Chreview%2Chcalendar%2Chatom%2Chresume%2Cgeo%2Cadr%2Ctag%2Cnofollow%2Clicense%2Cdirectory%2Chome%2Cenclosure%2Cvotelinks&output=json&report=on


The .Net library code looks like this:


using UfXtract;

string url = "http://www.glennjones.net/about/";

UfWebRequest webRequest = new UfWebRequest();

ArrayList formatArray = new ArrayList();
formatArray.Add(UfFormats.HCard());
formatArray.Add(UfFormats.Xfn());
formatArray.Add(UfFormats.Adr());
formatArray.Add(UfFormats.License());
...etc

webRequest.Load(url, formatArray);

if (webRequest.Data.Nodes.Count > 0)
{
   UfDataToJson dataConvertor = new UfDataToJson();
   Response.ContentType = "application/json";

   Response.Write(dataConvertor.Convert(webRequest.Data, formatArray));
}


It's about twice as slow running all the microformats formats at once than just one at once. That said I am only talking about an extra 50ms. 


Glenn 


From andreluis.pt at gmail.com  Fri Jul 30 12:42:36 2010
From: andreluis.pt at gmail.com (=?ISO-8859-1?Q?Andr=E9_Lu=EDs?=)
Date: Fri Jul 30 12:42:40 2010
Subject: [uf-discuss] UfXtract - more than one format at once
In-Reply-To: <36A319113CF910438942741C4727ADFF04A4935B@MOBY.Clarence.local>
References: <36A319113CF910438942741C4727ADFF04A4935B@MOBY.Clarence.local>
Message-ID: <AANLkTinVXb_9HOBjGNBwu+O-Ted7+VNOJydC9vArmPFP@mail.gmail.com>

hey Glenn,

On 30 July 2010 16:49, Glenn Jones <glenn.jones@madgex.com> wrote:
> Hi All
>
> Andr? Lu?s wrote
>> One minor gripe, though... can't we ask for transformation for more than one format at once? Optimus does this. And it's kinda useful to avoid more than one requests per URI... :)
>
>
> OK I have added parsing more than one microformat to the UfXtract API. You have always been able to do this with the .Net library I just did not build it into API. The API format parameter can now takes either a single value or comma delimited list as the example below.
>

Awesome Glenn. Thanks! As usual, we ask and you deliver (still
remember the xfolk -> bookmarks.html story).

Even though I'm not a .net programmer myself this little fix will help
me spread the word among fellow .net dev's.

50ms beats the overheard of making one request per format! ;)

Cheers,
--
Andr? Lu?s
http://id.andr3.net


> http://ufxtract.com/api/default.aspx?url=http%3A%2F%2Fwww.glennjones.net%2Fabout%2F&htmlfragment=&orginurl=http%3A%2F%2F&format=hcard%2Cxfn%2Chreview%2Chcalendar%2Chatom%2Chresume%2Cgeo%2Cadr%2Ctag%2Cnofollow%2Clicense%2Cdirectory%2Chome%2Cenclosure%2Cvotelinks&output=json&report=on
>
>
> The .Net library code looks like this:
>
>
> using UfXtract;
>
> string url = "http://www.glennjones.net/about/";
>
> UfWebRequest webRequest = new UfWebRequest();
>
> ArrayList formatArray = new ArrayList();
> formatArray.Add(UfFormats.HCard());
> formatArray.Add(UfFormats.Xfn());
> formatArray.Add(UfFormats.Adr());
> formatArray.Add(UfFormats.License());
> ...etc
>
> webRequest.Load(url, formatArray);
>
> if (webRequest.Data.Nodes.Count > 0)
> {
> ? UfDataToJson dataConvertor = new UfDataToJson();
> ? Response.ContentType = "application/json";
>
> ? Response.Write(dataConvertor.Convert(webRequest.Data, formatArray));
> }
>
>
> It's about twice as slow running all the microformats formats at once than just one at once. That said I am only talking about an extra 50ms.
>
>
> Glenn
>
>
> _______________________________________________
> microformats-discuss mailing list
> microformats-discuss@microformats.org
> http://microformats.org/mailman/listinfo/microformats-discuss
>