From tantek at cs.stanford.edu Sat Jul 3 19:18:43 2010 From: tantek at cs.stanford.edu (=?UTF-8?Q?Tantek_=C3=87elik?=) Date: Sat Jul 3 19:19:06 2010 Subject: [uf-discuss] 2 billion hCards! gathering material for a "microformats.org turns 5" blog post Message-ID: According to Yahoo! Search Monkey, there are now over 2 billion hCards on the web: http://search.yahoo.com/search?p=searchmonkey%3Acom.yahoo.page.uf.hcard This is perhaps due to a few fairly large recent deployments: * BrightKite.com - all venues and user profiles have hCard (millions) * Gravatar - all profiles now have hCards (millions) - used on WordPress.com etc. Some additional recent news: * microformats has 94% marketshare compared to alternatives (e.g. RDFa) according to Google (announced at the Semantic Technology conference) - http://www.readwriteweb.com/archives/google_semantic_web_push_rich_snippets_usage_grow.php - http://www.readwriteweb.com/images/richsnippets_june10b.jpg I'm collecting these into material for "microformats.org turns 5" blog post - additional suggestions welcome! http://microformats.org/wiki/microformats-turns-5 -- http://tantek.com/ From jeremy at adactio.com Mon Jul 5 07:32:54 2010 From: jeremy at adactio.com (Jeremy Keith) Date: Mon Jul 5 07:33:01 2010 Subject: [uf-discuss] 2 billion hCards! gathering material for a "microformats.org turns 5" blog post In-Reply-To: References: Message-ID: <4599FB06-4A53-4295-B62A-6FB165B41E5A@adactio.com> Tantek asked: > I'm collecting these into material for "microformats.org turns 5" blog > post - additional suggestions welcome! Well, this isn't huge in terms of numbers but it's something that makes my day to day work a whole lot smoother: 37 Signals have added hCards to Basecamp: http://answers.37signals.com/basecamp/556-any-chance-of-adding-hcards Jeremy -- Jeremy Keith a d a c t i o http://adactio.com/ From andreluis.pt at gmail.com Mon Jul 5 10:04:37 2010 From: andreluis.pt at gmail.com (=?ISO-8859-1?Q?Andr=E9_Lu=EDs?=) Date: Mon Jul 5 10:04:41 2010 Subject: [uf-discuss] 2 billion hCards! gathering material for a "microformats.org turns 5" blog post In-Reply-To: <4599FB06-4A53-4295-B62A-6FB165B41E5A@adactio.com> References: <4599FB06-4A53-4295-B62A-6FB165B41E5A@adactio.com> Message-ID: On 5 July 2010 15:32, Jeremy Keith wrote: > Tantek asked: >> I'm collecting these into material for "microformats.org turns 5" blog >> post - additional suggestions welcome! Tantek, one minor detail that might be worth correcting... what yahoo!'s searchmonkey says is that there are almost 2 bilion pages with hcards. That means those pages have at least one card, thus we can assume the number of hcards at large is far superior. ;) One point I'd like to see addressed in such a post, if possible, is the near future... Should we start pushing for an adaptation of all microformats tools to support microdata from HTML5 as well? Promote authors to write one *or* the other (microformats vs microdata)? Cheers, -- Andr? Lu?s http://id.andr3.net > > Well, this isn't huge in terms of numbers but it's something that makes my day to day work a whole lot smoother: > > 37 Signals have added hCards to Basecamp: > http://answers.37signals.com/basecamp/556-any-chance-of-adding-hcards > > Jeremy > > -- > Jeremy Keith > > a d a c t i o > > http://adactio.com/ > > > > _______________________________________________ > microformats-discuss mailing list > microformats-discuss@microformats.org > http://microformats.org/mailman/listinfo/microformats-discuss > From ehs at pobox.com Mon Jul 5 11:45:13 2010 From: ehs at pobox.com (Ed Summers) Date: Mon Jul 5 11:45:20 2010 Subject: [uf-discuss] 2 billion hCards! gathering material for a "microformats.org turns 5" blog post In-Reply-To: References: Message-ID: On Sat, Jul 3, 2010 at 10:18 PM, Tantek ?elik wrote: > Some additional recent news: > * microformats has 94% marketshare compared to alternatives (e.g. > RDFa) according to Google (announced at the Semantic Technology > conference) > ?- http://www.readwriteweb.com/archives/google_semantic_web_push_rich_snippets_usage_grow.php > ?- http://www.readwriteweb.com/images/richsnippets_june10b.jpg Was it clear if Google's stats were comparing all microformat usage with usage of only their particular rich snippet vocabulary [1]? I'd be surprised if it was *all* RDFa vocabulary use, since that would mean that Google are indexing all RDFa on the web. John Breslin asked a similar question in the comments on that RWW post [2]. If it isn't clear, I'd probably refrain from citing the 94% market share statistic in the microformats-turns-5 post. Although I guess this sort of posturing is to be expected, and most people take it as a given that "there are three kinds of lies: lies, damned lies, and statistics.", especially in religious debates [3] The 2 Billion statistic is astounding, considering there are an estimated 1.8 Billion people online [3]. It makes me appreciate how important efforts are to give people the ability identify, link, and unlink their online identities [4]. //Ed [1] http://rdf.data-vocabulary.org/rdf.xml [2] http://www.readwriteweb.com/archives/google_semantic_web_push_rich_snippets_usage_grow.php#comment-219873 [3] There are three kinds of lies: lies, damned lies, and statistics." [4] http://code.google.com/apis/opensocial/ From ehs at pobox.com Mon Jul 5 11:46:47 2010 From: ehs at pobox.com (Ed Summers) Date: Mon Jul 5 11:46:52 2010 Subject: [uf-discuss] 2 billion hCards! gathering material for a "microformats.org turns 5" blog post In-Reply-To: References: Message-ID: On Mon, Jul 5, 2010 at 2:45 PM, Ed Summers wrote: > [3] There are three kinds of lies: lies, damned lies, and statistics." I meant: [3] http://www.internetworldstats.com/stats.htm //Ed From pmika at yahoo-inc.com Tue Jul 6 01:27:03 2010 From: pmika at yahoo-inc.com (Peter Mika) Date: Tue Jul 6 01:27:51 2010 Subject: [uf-discuss] 2 billion hCards! gathering material for a "microformats.org turns 5" blog post In-Reply-To: References: Message-ID: <4C32E8D7.7080705@yahoo-inc.com> Hi Ed, The comparison to the number of people online is misleading, because the microformat stats quoted (both the Google and Yahoo figures) include duplicate counting. One of my illustrative examples is news.stanford.edu, where the microformat annotation is in the template, and thus every single page has exactly the same microformat markup, i.e. the address of Stanford University. To verify, try the query searchmonkey:com.yahoo.page.uf.hcard site:stanford.edu in Yahoo Search. The second point to make is that RDFa usage is underreported by [1]. Compare searchmonkey:com.yahoo.page.rdf.rdfa with searchmonkey:com.yahoo.page.uf.hcard These indicate that there are 2.7B pages with RDFa compared to 2B pages with hCard. There are many caveats to these numbers, but they are more or less on equal footing. Cheers, Peter [1] http://www.readwriteweb.com/archives/google_semantic_web_push_rich_snippets_usage_grow.php Ed Summers wrote: > On Sat, Jul 3, 2010 at 10:18 PM, Tantek ?elik wrote: > >> Some additional recent news: >> * microformats has 94% marketshare compared to alternatives (e.g. >> RDFa) according to Google (announced at the Semantic Technology >> conference) >> - http://www.readwriteweb.com/archives/google_semantic_web_push_rich_snippets_usage_grow.php >> - http://www.readwriteweb.com/images/richsnippets_june10b.jpg >> > > Was it clear if Google's stats were comparing all microformat usage > with usage of only their particular rich snippet vocabulary [1]? I'd > be surprised if it was *all* RDFa vocabulary use, since that would > mean that Google are indexing all RDFa on the web. John Breslin asked > a similar question in the comments on that RWW post [2]. > > If it isn't clear, I'd probably refrain from citing the 94% market > share statistic in the microformats-turns-5 post. Although I guess > this sort of posturing is to be expected, and most people take it as a > given that "there are three kinds of lies: lies, damned lies, and > statistics.", especially in religious debates [3] > > The 2 Billion statistic is astounding, considering there are an > estimated 1.8 Billion people online [3]. It makes me appreciate how > important efforts are to give people the ability identify, link, and > unlink their online identities [4]. > > //Ed > > [1] http://rdf.data-vocabulary.org/rdf.xml > [2] http://www.readwriteweb.com/archives/google_semantic_web_push_rich_snippets_usage_grow.php#comment-219873 > [3] There are three kinds of lies: lies, damned lies, and statistics." > [4] http://code.google.com/apis/opensocial/ > > _______________________________________________ > microformats-discuss mailing list > microformats-discuss@microformats.org > http://microformats.org/mailman/listinfo/microformats-discuss > From tantek at cs.stanford.edu Wed Jul 7 02:25:38 2010 From: tantek at cs.stanford.edu (=?UTF-8?Q?Tantek_=C3=87elik?=) Date: Wed Jul 7 02:26:02 2010 Subject: [uf-discuss] 2 billion hCards! gathering material for a "microformats.org turns 5" blog post In-Reply-To: <4C32E8D7.7080705@yahoo-inc.com> References: <4C32E8D7.7080705@yahoo-inc.com> Message-ID: Jeremy, > Well, this isn't huge in terms of numbers but it's something that makes my day to day work a whole lot smoother: > > 37 Signals have added hCards to Basecamp: > http://answers.37signals.com/basecamp/556-any-chance-of-adding-hcards This is great news! In the few times I've used Basecamp I remember being quite frustrated by the lack of hCard support and simple person info portability. Great to see that 37 Signals has added hCards. Peter, On Tue, Jul 6, 2010 at 1:27 AM, Peter Mika wrote: > Hi Ed, > > The comparison to the number of people online is misleading, because the > microformat stats quoted (both the Google and Yahoo figures) include > duplicate counting. One of my illustrative examples is news.stanford.edu, > where the microformat annotation is in the template, and thus every single > page has exactly the same microformat markup, i.e. the address of Stanford > University. On the other hand, there are also numerous pages with multiple hCards per page. Directory listings, friends lists, about pages for companies listing their executives etc. The wiki has many such examples already: http://microformats.org/wiki/hcard-examples-in-wild There are certainly: * multiple pages with the same hCard. * pages with multiple hCards. This was my experience with the microformats indexer we built at Technorati back in the day. It's hard to know how these average out. You have to write a bunch more code (e.g. really good deduping etc.) to figure it out. Lacking that we should cite *pages* with hCards rather than total hCards for the Search Monkey stat to be more accurate. > The second point to make is that RDFa usage is underreported by [1]. Compare > > searchmonkey:com.yahoo.page.rdf.rdfa > > with > > searchmonkey:com.yahoo.page.uf.hcard > > These indicate that there are 2.7B pages with RDFa I think this may be an errant number based on the way that Search Monkey normalizes things internally to RDFa (because of an unfortunate premature architectural decision that they then became stuck with - as it was related to me by Paul Tarjan). OR (and this deserves a little analysis) Those pages don't actually all (if any?) contain RDFa. Look at the first page of results. E.g. Wordpress.org results don't have any RDFa. View source and the only thing even remotely resembling you see is: - which is simply use of an invalid "property" attribute (in XHTML 1.0). The qname "fb:" is not defined anywhere. This is not RDFa, this is simply a tag using a new (invalid) syntax. That is, using "property" instead of the standard HTML 4.01 "name" attribute: Similarly with CNN.com, download.cnet.com, online.wsj.com. OTOH, www.vistaprint.ca, digg.com, www.joomlart.com, www.webmd.com don't even have "property" attributes. Who knows why they're listed in that result page. No evidence of any RDFa on those pages. www.metacafe.com does appear to define an "og" qname and use it in a "property" attribute. And that's it for the first page of results for that query "searchmonkey:com.yahoo.page.rdf.rdfa" - Only 1 out of 10 of at least the first page of results actually had any RDFa - and that one was invisible data at that. It does not appear that that query actually returns pages with rdfa, for the most part not in any valid sense, nor in any sense of the intent of marking up existing visible content with additional attributes. Perhaps a challenge could be posed - how many results of that query do you have to look through before you find a legitimate "marking up visible data" instance of RDFa? In 4 pages of results (40) I only found 2 - and both were on the Creative Commons site - not a big surprise given that Ben Adida is both co-chair of RDFa WG and works for Creative Commons. But no others. It seems that RDFa usage is grossly exaggerated (by at least a factor of 20) by the Yahoo Search Monkey "searchmonkey:com.yahoo.page.rdf.rdfa" query. > compared to 2B pages with > hCard. There are many caveats to these numbers, but they are more or less on > equal footing. They're not even close (at least an order of magnitude difference), as the above debunking of the RDFa results demonstrates. Ed, > Ed Summers wrote: >> >> On Sat, Jul 3, 2010 at 10:18 PM, Tantek ?elik >> wrote: >> >>> >>> Some additional recent news: >>> * microformats has 94% marketshare compared to alternatives (e.g. >>> RDFa) according to Google (announced at the Semantic Technology >>> conference) >>> ?- >>> http://www.readwriteweb.com/archives/google_semantic_web_push_rich_snippets_usage_grow.php >>> ?- http://www.readwriteweb.com/images/richsnippets_june10b.jpg >>> >> >> Was it clear if Google's stats were comparing all microformat usage >> with usage of only their particular rich snippet vocabulary [1]? I'd >> be surprised if it was *all* RDFa vocabulary use, since that would >> mean that Google are indexing all RDFa on the web. John Breslin asked >> a similar question in the comments on that RWW post [2]. This is an excellent question. In particular the context (and numbers) of that slide appear to be rich snippet specific - both for microformats and RDFa. That is, comparing particular microformats for rich snippets, and particular RDFa for rich snippets - 94% of the instances of markup for rich snippets they found were done with microformats. Good catch Ed, that's an important detail to call out. Thanks everyone for the corrections and additions. I've updated the wiki accordingly: http://microformats.org/wiki/microformats-turns-5 Please let me know if I've missed anything else - I'm going to go ahead and write this up tomorrow morning. Thanks, Tantek -- http://tantek.com/ From mail at tobyinkster.co.uk Wed Jul 7 04:43:52 2010 From: mail at tobyinkster.co.uk (Toby Inkster) Date: Wed Jul 7 04:53:20 2010 Subject: [uf-discuss] 2 billion hCards! gathering material for a "microformats.org turns 5" blog post In-Reply-To: References: <4C32E8D7.7080705@yahoo-inc.com> Message-ID: <20100707124352.3f2a215f@miranda.g5n.co.uk> On Wed, 7 Jul 2010 02:25:38 -0700 Tantek ?elik wrote: > E.g. Wordpress.org results don't have any RDFa. > > View source and the only thing even remotely resembling you see is: > > > > - which is simply use of an invalid "property" attribute (in XHTML > 1.0). The qname "fb:" is not defined anywhere. In the current RDFa 1.1 drafts, this is allowed, though its meaning is not likely what the authors of this page intended. In 1.1, prefixes which are not bound to anything are assumed to be absolute URIs. The page at http://wordpress.org/ does actually contain 3 triples if evaluated as RDFa 1.0, though they're each the result of RDFa grandfathering in certain HTML 4/XHTML 1 semantics. The question "how many pages contain RDFa?" is only meaningful if certain qualifications are added... Does broken RDFa count? Do grandfathered rel/rev values count? &c. In fact, "how many pages" questions about the Web are not especially meaningful. Say Google added an hCard to its search result pages, replacing its current logo with something like this: Are the search results for "foo" and "bar" different pages? What about the search results for "100000000001" and "100000000002"? Because if they are, that's over a hundred billion hCards online. -- Toby A Inkster From tantek at cs.stanford.edu Wed Jul 7 08:24:52 2010 From: tantek at cs.stanford.edu (=?UTF-8?Q?Tantek_=C3=87elik?=) Date: Wed Jul 7 08:33:01 2010 Subject: [uf-discuss] 2 billion hCards! gathering material for a "microformats.org turns 5" blog post In-Reply-To: <20100707124352.3f2a215f@miranda.g5n.co.uk> References: <4C32E8D7.7080705@yahoo-inc.com> <20100707124352.3f2a215f@miranda.g5n.co.uk> Message-ID: On Wed, Jul 7, 2010 at 4:43 AM, Toby Inkster wrote: > On Wed, 7 Jul 2010 02:25:38 -0700 > Tantek ?elik wrote: > >> E.g. Wordpress.org results don't have any RDFa. >> >> View source and the only thing even remotely resembling you see is: >> >> >> >> - which is simply use of an invalid "property" attribute (in XHTML >> 1.0). The qname "fb:" is not defined anywhere. > > In the current RDFa 1.1 drafts, this is allowed, though its meaning is > not likely what the authors of this page intended. In 1.1, prefixes > which are not bound to anything are assumed to be absolute URIs. So it's another form of invalid syntax then, since "fb:" is not a defined protocol. > The page at http://wordpress.org/ does actually contain 3 triples if > evaluated as RDFa 1.0, though they're each the result of RDFa > grandfathering in certain HTML 4/XHTML 1 semantics. No, it might contain 3 RDF triples - but they're not RDF*a*. Just because a page can be parsed/converted into another format does not mean it "contains" that format. Saying so is deceptively mis-using the word "contains" at best, and playing semantic games at worst. Just because a page has hAtom does not mean it "contains" Atom. Just because a page has microdata does not mean it "contains" JSON (though an exceptionally precise direct conversion is defined). etc. Similarly to microdata, as we define more precise parsing rules for microformats, we'll have direct conversions to JSON and RDF triples as well. This does not mean that all pages with microformats "contain" JSON or RDF. The question of comparison is deliberately chosen to illuminate what are developers actually coding? What syntax? Not what can you "infer", "parse as", or "convert to". Because as you know with the parsers you've written, you can convert syntaxes to nearly any implied format - it tells you nothing about usage. > The question "how many pages contain RDFa?" is only meaningful if > certain qualifications are added... Does broken RDFa count? broken RDFa counts, but only to demonstrate the difficulty of coding RDFa, not instances of RDF(a). one of the reasons that Google found so little RDFa is may be because much of it was broken. this is one of the common problems with namespaces in data. > Do > grandfathered rel/rev values count? &c. rel/rev syntax and values work without RDFa - they're not RDFa, despite RDFa's attempt to subsume them (and even errantly claim/imply credit in the spec, e.g. rel-license). > In fact, "how many pages" questions about the Web are not especially > meaningful. Say Google added an hCard to its search result pages, > replacing its current logo with something like this: > > ? ? ? ? > ? ? ? ? ? ? ? ? > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?alt="Google" src="..." /> > ? ? ? ? ? ? ? ? > ? ? ? ? > > Are the search results for "foo" and "bar" different pages? What about > the search results for "100000000001" and "100000000002"? Because if > they are, that's over a hundred billion hCards online. 1. theoretical strawman[1] 2. google.com/robots.txt prevents this from counting in any "search" Tantek [1] http://en.wikipedia.org/wiki/Straw_man -- http://tantek.com/ From thomas at stray.net Wed Jul 7 10:10:07 2010 From: thomas at stray.net (=?iso-8859-1?Q?thomas_l=F6rtsch?=) Date: Wed Jul 7 10:10:19 2010 Subject: [uf-discuss] 2 billion hCards! gathering material for a "microformats.org turns 5" blog post In-Reply-To: References: <4C32E8D7.7080705@yahoo-inc.com> <20100707124352.3f2a215f@miranda.g5n.co.uk> Message-ID: On Jul 7, 2010, at 6:24 PM, Tantek ?elik wrote: > On Wed, Jul 7, 2010 at 4:43 AM, Toby Inkster wrote: >> On Wed, 7 Jul 2010 02:25:38 -0700 >> Tantek ?elik wrote: >> >>> E.g. Wordpress.org results don't have any RDFa. >>> >>> View source and the only thing even remotely resembling you see is: >>> >>> >>> >>> - which is simply use of an invalid "property" attribute (in XHTML >>> 1.0). The qname "fb:" is not defined anywhere. >> >> In the current RDFa 1.1 drafts, this is allowed, though its meaning is >> not likely what the authors of this page intended. In 1.1, prefixes >> which are not bound to anything are assumed to be absolute URIs. > > So it's another form of invalid syntax then, since "fb:" is not a > defined protocol. > > >> The page at http://wordpress.org/ does actually contain 3 triples if >> evaluated as RDFa 1.0, though they're each the result of RDFa >> grandfathering in certain HTML 4/XHTML 1 semantics. > > No, it might contain 3 RDF triples - but they're not RDF*a*. > > Just because a page can be parsed/converted into another format does > not mean it "contains" that format. > > Saying so is deceptively mis-using the word "contains" at best, and > playing semantic games at worst. > > Just because a page has hAtom does not mean it "contains" Atom. > > Just because a page has microdata does not mean it "contains" JSON > (though an exceptionally precise direct conversion is defined). etc. > > Similarly to microdata, as we define more precise parsing rules for > microformats, we'll have direct conversions to JSON and RDF triples as > well. This does not mean that all pages with microformats "contain" > JSON or RDF. > > The question of comparison is deliberately chosen to illuminate what > are developers actually coding? What syntax? Not what can you "infer", > "parse as", or "convert to". > > Because as you know with the parsers you've written, you can convert > syntaxes to nearly any implied format - it tells you nothing about > usage. > > >> The question "how many pages contain RDFa?" is only meaningful if >> certain qualifications are added... Does broken RDFa count? > > broken RDFa counts, but only to demonstrate the difficulty of coding > RDFa, not instances of RDF(a). one of the reasons that Google found so > little RDFa is may be because much of it was broken. this is one of > the common problems with namespaces in data. does broken tantek count? this "my format is longer than your format" strikes me as rather silly. 50 million elvis fans can't be wrong (most of them use neither). regards thomas l?rtsch > >> Do >> grandfathered rel/rev values count? &c. > > rel/rev syntax and values work without RDFa - they're not RDFa, > despite RDFa's attempt to subsume them (and even errantly claim/imply > credit in the spec, e.g. rel-license). > > >> In fact, "how many pages" questions about the Web are not especially >> meaningful. Say Google added an hCard to its search result pages, >> replacing its current logo with something like this: >> >> >> >> > alt="Google" src="..." /> >> >> >> >> Are the search results for "foo" and "bar" different pages? What about >> the search results for "100000000001" and "100000000002"? Because if >> they are, that's over a hundred billion hCards online. > > 1. theoretical strawman[1] > 2. google.com/robots.txt prevents this from counting in any "search" > > > Tantek > > [1] http://en.wikipedia.org/wiki/Straw_man > > -- > http://tantek.com/ > > _______________________________________________ > microformats-discuss mailing list > microformats-discuss@microformats.org > http://microformats.org/mailman/listinfo/microformats-discuss > From hober0 at gmail.com Wed Jul 7 15:12:42 2010 From: hober0 at gmail.com (Edward O'Connor) Date: Wed Jul 7 15:13:11 2010 Subject: [uf-discuss] patches (speaking of "microformats.org turns 5") Message-ID: Tantek wrote: > I'm collecting these into material for "microformats.org turns 5" blog > post - additional suggestions welcome! I'm in the process of ordering a bunch of sew-on Microformats patches (about 2.5" square). I'll let the list know when they're ready! Ted From info at csarven.ca Wed Jul 7 15:53:08 2010 From: info at csarven.ca (Sarven Capadisli) Date: Wed Jul 7 15:53:25 2010 Subject: [uf-discuss] 2 billion hCards! gathering material for a "microformats.org turns 5" blog post Message-ID: <1278543188.1603.42.camel@csarven-netbook> On Sat, 2010-07-03 at 19:18 -0700, Tantek ?elik wrote: > According to Yahoo! Search Monkey, there are now over 2 billion hCards > on the web: > > http://search.yahoo.com/search?p=searchmonkey% > 3Acom.yahoo.page.uf.hcard > > This is perhaps due to a few fairly large recent deployments: > * BrightKite.com - all venues and user profiles have hCard (millions) > * Gravatar - all profiles now have hCards (millions) - used on > WordPress.com etc. > > Some additional recent news: > * microformats has 94% marketshare compared to alternatives (e.g. > RDFa) according to Google (announced at the Semantic Technology > conference) > - > http://www.readwriteweb.com/archives/google_semantic_web_push_rich_snippets_usage_grow.php > - http://www.readwriteweb.com/images/richsnippets_june10b.jpg > > I'm collecting these into material for "microformats.org turns 5" blog > post - additional suggestions welcome! > > http://microformats.org/wiki/microformats-turns-5 > > -- > http://tantek.com/ I'm not sure about exact numbers, but a StatusNet instance (e.g., http://identi.ca/ ), has hCards for all users and groups. It includes representative hCards. Updated wiki. -Sarven From tantek at cs.stanford.edu Thu Jul 8 01:25:03 2010 From: tantek at cs.stanford.edu (=?UTF-8?Q?Tantek_=C3=87elik?=) Date: Thu Jul 8 01:25:27 2010 Subject: [uf-discuss] 2 billion hCards! gathering material for a "microformats.org turns 5" blog post In-Reply-To: <20100708002838.1b370e8b@miranda.g5n.co.uk> References: <4C32E8D7.7080705@yahoo-inc.com> <20100707124352.3f2a215f@miranda.g5n.co.uk> <20100708002838.1b370e8b@miranda.g5n.co.uk> Message-ID: Toby, On Wed, Jul 7, 2010 at 4:28 PM, Toby Inkster wrote: > On Wed, 7 Jul 2010 08:24:52 -0700 > Tantek ?elik wrote: > >> > The page at http://wordpress.org/ does actually contain 3 triples i >> > evaluated as RDFa 1.0, though they're each the result of RDFa >> > grandfathering in certain HTML 4/XHTML 1 semantics. >> >> No, it might contain 3 RDF triples - but they're not RDF*a*. > > It contains three attributes which are described by the XHTML+RDFaspec, > and which, when processed according to the RDFa spec, each produce an > triple. > >> Just because a page can be parsed/converted into another format does >> not mean it "contains" that format. > > The page at http://wordpress.org/ doesn't need to be converted to RDFa. > It is RDFa. (It doesn't use an RDFa DTD, though many seem to believe > that judging an XML document's type by its DTD is a layering violation.) > > It would need to be converted if you wanted RDF/XML, Turtle or JSON. > But it doesn't need to be converted to RDFa; it is RDFa. These assertions of "is RDFa" on grandfathered formats/syntaxes are deceptive because it's essentially claiming implied credit/branding for something that had nothing to do with RDFa. E.g. if some future version of XHTML+RDFa spec describes how to process microformats (given the trend the RDFa specs to grandfather in more and more syntax - it's reasonable to predict that this happen), then you can make the same claim there, that all use of microformats are RDFa, which then dilutes the phrase "is RDFa" to the point of meaninglessness. Such a conflation of reclassifying previously non-RDFa markup as RDFa is, as I said, clouding a definition at best, and deceptive/dishonest at worst. It still just conversion of a *previous* syntax, defined *outside* and *predating* RDFa. Another analogy: you could make a new spec called BrandXSemantics (BXS) that defined processing of all syntaxes like meta tags, microformats, RDFa, microdata etc. that claimed that all such syntaxes were BXS, but such a claim is of little utility and would merely serve to artificially inflate claims about BXS being more popular that microformats or RDFa or microdata - this is essentially what this kind of "grandfathering" in RDFa is doing. Claiming "It is RDFa" is also deceptive from the point of view of developer behavior, which is illustrated by your next point. >> Saying so is deceptively mis-using the word "contains" at best, and >> playing semantic games at worst. >> >> Just because a page has hAtom does not mean it "contains" Atom. > > No, it "contains" hAtom and can possibly be converted to Atom (atom:id > concerns notwithstanding). > > The page at http://wordpress.org/ contains RDFa and can be converted to > RDF/XML. > >> The question of comparison is deliberately chosen to illuminate what >> are developers actually coding? What syntax? Not what can you "infer", >> "parse as", or "convert to". > > In the case of http://wordpress.org/, they have coded RDFa. Thanks to > the fact that RDFa grandfathered in some semantics from earlier > versions of (X)HTML, they may not have been *knowingly* doing so. Claiming some code is RDFa that clearly was not *knowingly* written/intended as such points out the key flaw - if you're talking about what are developers adopting, then their intent, and what they are explicitly choosing to do is what matters. Thus comparisons like Google's Rich Snippets adoption table make sense to contrast developer adoption of different format approaches. >> > The question "how many pages contain RDFa?" is only meaningful if >> > certain qualifications are added... Does broken RDFa count? >> >> broken RDFa counts, but only to demonstrate the difficulty of coding >> RDFa, not instances of RDF(a). one of the reasons that Google found >> so little RDFa is may be because much of it was broken. this is one of >> the common problems with namespaces in data. > > Do twitter's 100 million plus broken hCards demonstrate the difficulty > of coding microformats? If there are problems with Twitter's hCards, please document the specific problems on the respective issues page that way we can better verify the problem report(s), investigate possible causes, and suggest fixes to Twitter as well. I've added a placeholder section for this: http://microformats.org/wiki/hcard-supporting-user-profiles-issues#Twitter > I imagine that the reason Google found so little RDFa is because they > were only counting RDFa that used their own RDFa vocabulary, and > neglecting to count *all* RDFa. Without more information on their > testing process I can't verify that though. My understanding of RDF(a) advocates is that one of the design principles of RDF(a) is its infinite extensibility and philosophy of encouraging everyone to make up their own vocabulary (which is often contrasted with microformats opposite design principle of deliberate re-use of shared vocabularies for better interoperability and communication). Google using their own RDFa vocabulary is a direct consequence of this principle/philosophy of RDF(a)/namespaces etc., and thus if there's a problem with that approach, it merely calls into question that principle/philosophy of RDF(a)/namespaces. > This would be analogous to Wikipedia surveying usage levels of rel-tag > by searching for rel-tag links to http://en.wikipedia.org/wiki/* only. It's not analogous because rel-tag doesn't explicitly state nor encourage sites to only use their own rel-tags, whereas RDF(a) does encourage making up and using your own vocabularies. >>> Do grandfathered rel/rev values count? &c. >> >> rel/rev syntax and values work without RDFa - they're not RDFa, >> despite RDFa's attempt to subsume them (and even errantly claim/imply >> credit in the spec, e.g. rel-license). > > I don't think the RDFa spec claims credit for anything in particular. > It reuses a lot of (X)HTML attributes and rel/rev values, but is rather > silent on their origins. Right - it's that "silent on their origins" which is sloppy at best and plagiaristic (implying first invention/credit by absence of citation of prior art) at worst. I'll follow-up with a more detailed description of where/when RDFa claims/implies credit for work that predates RDFa. E.g. the introduction of rel='license' in an example following a section that states "examples to illustrate how Alice can use RDFa" [1] is one such errant/deceptive implication that rel="license" is RDFa, that fails to provide citations to the invention/introduction of rel="license" [2] which IMHO borders on plagiarism, writing something implying claiming/taking credit for something that was invented by another beforehand, and omitting the reference to prior art. [1] http://www.w3.org/TR/2008/NOTE-xhtml-rdfa-primer-20081014/#id84491 [2] http://microformats.org/wiki/history 2004-02-11 http://tantek.com/presentations/2004etech/realworldsemanticspres.html The counter-argument is that perhaps it is/was a case of simultaneous invention, which I would prefer to give more weight to, except that the microformats introduction of rel-license was explicitly discussed/mentioned afterwards on the Creative Commons mailing list[3] where many related subsequent RDF discussions were had: [3] http://lists.ibiblio.org/pipermail/cc-metadata/2004-February/000290.html >> 1. theoretical strawman[1] >> 2. google.com/robots.txt prevents this from counting in any "search" > > I think you're neglecting the serious point that page counts on the Web > are not especially significant - it's easy to generate many millions of > pages from a single template. If it's a "serious point" - please provide data to substantiate that criticism rather than merely asserting that Yahoo Search Monkey returns numbers that "are not especially significant" - I think the Yahoo Search Monkey developers deserve more benefit of the doubt. > There are probably much more interesting measures than page counts. To > evaluate the health of a format, it's just as important -- perhaps more > important -- to look at how many active consumers there are. By all means, propose alternative concrete "more interesting measures" and how you would measure them. Until then, the concrete Yahoo Search Monkey measures are the most interesting measures of web-wide microformats adoption to date. Sarven, On Wed, Jul 7, 2010 at 3:53 PM, Sarven Capadisli wrote: > > I'm not sure about exact numbers, but a StatusNet instance (e.g., > http://identi.ca/ ), has hCards for all users and groups. It includes > representative hCards. > > Updated wiki. Thanks much Sarven! Do you know *when* Identica added hCard support? (I'd really prefer to keep this blog post to recognizing specific deployments in the past year) Also, do you know how many Identica/status.net profiles there are today? Please feel free to add answers to those directly to Identica's entry on the hCard supporting user profiles page: http://microformats.org/wiki/hcard-supporting-user-profiles Thanks, Tantek -- http://tantek.com/ From mail at tobyinkster.co.uk Thu Jul 8 02:47:02 2010 From: mail at tobyinkster.co.uk (Toby Inkster) Date: Thu Jul 8 02:47:57 2010 Subject: [uf-discuss] 2 billion hCards! gathering material for a "microformats.org turns 5" blog post In-Reply-To: References: <4C32E8D7.7080705@yahoo-inc.com> <20100707124352.3f2a215f@miranda.g5n.co.uk> <20100708002838.1b370e8b@miranda.g5n.co.uk> Message-ID: <20100708104702.537d1fc4@miranda.g5n.co.uk> On Thu, 8 Jul 2010 01:25:03 -0700 Tantek ?elik wrote: > If there are problems with Twitter's hCards, please document the > specific problems on the respective issues page that way we can better > verify the problem report(s), investigate possible causes, and suggest > fixes to Twitter as well. It's been documented on the Wiki since 2007. http://microformats.org/wiki/implementations?diff=23858 > My understanding of RDF(a) advocates is that one of the design > principles of RDF(a) is its infinite extensibility and philosophy of > encouraging everyone to make up their own vocabulary (which is often > contrasted with microformats opposite design principle of deliberate > re-use of shared vocabularies for better interoperability and > communication). I wouldn't say that RDF encourages everyone to make up their own vocabulary, but that it makes it feasible. > Google using their own RDFa vocabulary is a direct consequence of this > principle/philosophy of RDF(a)/namespaces etc., and thus if there's a > problem with that approach, it merely calls into question that > principle/philosophy of RDF(a)/namespaces. There's no problem with Google making up their own RDF vocabulary. The problem is counting the number of uses of their own vocabulary on the Web, taking that number and claiming it as representative of RDFa deployment as a whole. > The counter-argument is that perhaps it is/was a case of simultaneous > invention, which I would prefer to give more weight to, except that > the microformats introduction of rel-license was explicitly > discussed/mentioned afterwards on the Creative Commons mailing list[3] > where many related subsequent RDF discussions were had: > > http://lists.ibiblio.org/pipermail/cc-metadata/2004-February/000290.html If you go back a further three months you'll see this thread: http://lists.ibiblio.org/pipermail/cc-metadata/2003-December/000237.html Cory Nelson wrote: | I propose sites under a CC license include a meta tag in their header | saying so. Though this won't help people recognize the content as | being under a CC license, it could help search engines greatly. | | Here is an example: | | And Lucas Gonze followed up with: | It would also work to have a "link rel=" element So the seed of the idea had been around since before the microformat proposal. Certainly the microformat proposal solidified the idea, but it's not inconceivable that when rel=license was proposed to be added to XHTML2 (the metadata parts of which evolved into RDFa), Ben Adida was drawing from earlier ideas, and possibly unaware of the microformat. http://lists.w3.org/Archives/Public/www-html-editor/2005AprJun/0178.html It's worth noting that before "license" was added to the XHTML2 link relations vocabulary, the term "license" was already defined in both Creative Commons' and Dublin Core's vocabularies, in the former case since 2008. Ben's proposal seems not so much inspired by the microformats use, but rather to move the term "license" out of Creative Commons' namespace to help clarify that it may be used to point to non-CC licenses too. -- Toby A Inkster From microformats.org at boblet.net Mon Jul 12 08:27:52 2010 From: microformats.org at boblet.net (Oli Studholme) Date: Mon Jul 12 08:28:20 2010 Subject: [uf-discuss] re: HTML5 support Message-ID: Hey all, I?ve got a few questions about using microformats in HTML5: Back on 14 October 2009, Tantek made the following additions to http://microformats.org/wiki/html5 === microdata vocabularies microdata vCard - use hCard instead, taking into account the hCard FAQ and resolved+closed issues. hCard 1.0.1 (under development) is incorporating these errata. Avoid the "microdata vCard" vocabulary as it is an out-of-date fork/snapshot of hCard. microdata vEvent - use hCalendar instead, taking into account the hCalendar FAQ and resolved+closed issues. hCalendar 1.0.1 is incorporating these errata. Avoid the "microdata vEvent" vocabulary, as it is an out-of-date fork/snapshot of hCalendar's vevent root class name and applicable properties. === I?m assuming this was when Microdata vcard and vevent specs were based on hCard and hCalendar. They?re now based on the original RFCs, so I guess these warnings are no longer relevant, and have updated the page. If they are still relevant (Tantek?) please let me know the situation and I?ll update as required or roll back. Ref: * Microdata vcard: http://www.whatwg.org/specs/web-apps/current-work/multipage/links.html#vcard * Microdata vevent: http://www.whatwg.org/specs/web-apps/current-work/multipage/links.html#vevent I?m also wondering if someone can explain the ?encourage HTML5 to drop their vocabulary and use ?F vocabulary instead? comments on the brainstorming pages linked to from: http://microformats.org/wiki/html5#Requests Again if these are no longer accurate I?m happy to update them. Under Current microformat compatibility http://microformats.org/wiki/html5#Current_microformat_compatibility only hCard and XFN are listed as compatible. I?m wondering if I should also add these specifications too: * XOXO * rel-nofollow (defined in HTML5 spec) * rel-license (defined in HTML5 spec) * rel-tag (defined in HTML5 spec) (the rel values are defined on http://www.whatwg.org/specs/web-apps/current-work/multipage/links.html ) There?s a bunch more draft specifications that look to be compatible, and there?s also a way to add extra rel values to the HTML5 spec: http://wiki.whatwg.org/wiki/RelExtensions Finally, what was the upshoot of this email about the ?magic? in fn? http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2010-January/024881.html Thanks for your time peace - oli @boblet From tantek at cs.stanford.edu Mon Jul 12 09:31:00 2010 From: tantek at cs.stanford.edu (Tantek Celik) Date: Mon Jul 12 10:02:42 2010 Subject: [uf-discuss] re: HTML5 support In-Reply-To: References: Message-ID: <1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry> Hi Oli, In short, the warnings are still relevant. Please don't update the pages just based on a "guess" that the warnings are no longer relevant, and revert accordingly. If you can verify the changes made in microdata are consistent with hCard etc (rather than a fork), and cite the specific changes, then it makes sense to make updates. Regarding the rel-values - the latest correct definitions are still on the microformats wiki. For example the HTML5 definition of rel-tag mistakenly always applies it to the whole page which is incorrect. The microformats.org/wiki/rel-tag spec and implementations commonly apply it to parts of a page like blog posts (hAtom, Technorati, IceRocket), or contacts/events/items (hCard, hCalendar, hReview, hListing, hProduct). In general, the latest, most accurate work on microformats (both class vocabularies and rel values), is on the microformats wiki, not the HTML5 spec, and thus you should refer to the microformats wiki spec pages as canonical. Thanks, Tantek -----Original Message----- From: Oli Studholme Sender: microformats-discuss-bounces@microformats.org Date: Tue, 13 Jul 2010 00:27:52 To: Reply-To: Microformats Discuss Subject: [uf-discuss] re: HTML5 support Hey all, I?ve got a few questions about using microformats in HTML5: Back on 14 October 2009, Tantek made the following additions to http://microformats.org/wiki/html5 === microdata vocabularies microdata vCard - use hCard instead, taking into account the hCard FAQ and resolved+closed issues. hCard 1.0.1 (under development) is incorporating these errata. Avoid the "microdata vCard" vocabulary as it is an out-of-date fork/snapshot of hCard. microdata vEvent - use hCalendar instead, taking into account the hCalendar FAQ and resolved+closed issues. hCalendar 1.0.1 is incorporating these errata. Avoid the "microdata vEvent" vocabulary, as it is an out-of-date fork/snapshot of hCalendar's vevent root class name and applicable properties. === I?m assuming this was when Microdata vcard and vevent specs were based on hCard and hCalendar. They?re now based on the original RFCs, so I guess these warnings are no longer relevant, and have updated the page. If they are still relevant (Tantek?) please let me know the situation and I?ll update as required or roll back. Ref: * Microdata vcard: http://www.whatwg.org/specs/web-apps/current-work/multipage/links.html#vcard * Microdata vevent: http://www.whatwg.org/specs/web-apps/current-work/multipage/links.html#vevent I?m also wondering if someone can explain the ?encourage HTML5 to drop their vocabulary and use ?F vocabulary instead? comments on the brainstorming pages linked to from: http://microformats.org/wiki/html5#Requests Again if these are no longer accurate I?m happy to update them. Under Current microformat compatibility http://microformats.org/wiki/html5#Current_microformat_compatibility only hCard and XFN are listed as compatible. I?m wondering if I should also add these specifications too: * XOXO * rel-nofollow (defined in HTML5 spec) * rel-license (defined in HTML5 spec) * rel-tag (defined in HTML5 spec) (the rel values are defined on http://www.whatwg.org/specs/web-apps/current-work/multipage/links.html ) There?s a bunch more draft specifications that look to be compatible, and there?s also a way to add extra rel values to the HTML5 spec: http://wiki.whatwg.org/wiki/RelExtensions Finally, what was the upshoot of this email about the ?magic? in fn? http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2010-January/024881.html Thanks for your time peace - oli @boblet _______________________________________________ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss From martin at weborganics.co.uk Mon Jul 12 13:13:25 2010 From: martin at weborganics.co.uk (Martin McEvoy) Date: Mon Jul 12 13:40:02 2010 Subject: [uf-discuss] re: HTML5 support In-Reply-To: <1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry> References: <1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry> Message-ID: <4C3B7765.1080603@weborganics.co.uk> On 12/07/2010 17:31, Tantek Celik wrote: > If you can verify the changes made in microdata are consistent with hCard etc (rather than a fork), and cite the specific changes, then it makes sense to make updates. It may be relevant to note that microdata is no longer part of the HTML5 core [1] . microdata does however exist as a separate specification [2] but is just "attributes" and as far as I know, microdata vCard and vEvent no longer exists as part of the microdata specification do they?. I wouldnt really be surprised to see microdata disappear all together(but that's just my thought) Best wishes [1] http://www.w3.org/TR/html5/ [2] http://www.w3.org/TR/microdata/ -- Martin McEvoy From philipj at opera.com Tue Jul 13 04:24:42 2010 From: philipj at opera.com (=?utf-8?Q?Philip_J=C3=A4genstedt?=) Date: Tue Jul 13 06:18:40 2010 Subject: [uf-discuss] re: HTML5 support In-Reply-To: References: Message-ID: On Mon, 12 Jul 2010 17:27:52 +0200, Oli Studholme wrote: > Finally, what was the upshoot of this email about the ?magic? in fn? > http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2010-January/024881.html It was dropped in with the commit message "Remove the magic from the vCard vocabulary, since the magic doesn't really work." It should be removed from the upstream vocabulary too, but I have little hope of that happening. -- Philip J?genstedt Core Developer Opera Software From microformats.org at boblet.net Tue Jul 13 09:59:42 2010 From: microformats.org at boblet.net (Oli Studholme) Date: Tue Jul 13 10:18:21 2010 Subject: [uf-discuss] re: HTML5 support In-Reply-To: <1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry> References: <1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry> Message-ID: Hey all, Thanks for your replies On Tue, Jul 13, 2010 at 1:31 AM, Tantek Celik wrote: > Please don't update the pages just based on a "guess" that the warnings are no longer relevant, and revert accordingly. I?d revert the warnings, but it appears you?ve moved the content to the wiki/microdata page, so I?m assuming the current text is as desired. I asked @hixie about the warning and was told that the vCard vocabulary had been based on hCard (I guess this is the fork your comment referred to), but was now based directly on vCard. I also asked @phae and @adactio about the warning, and was encouraged to make changes. I?m not able to find a corroborating svn log entry ? I?ll ask @hixie for more info. > In general, the latest, most accurate work on microformats (both class vocabularies and rel values), is on the microformats wiki, not the HTML5 spec, and thus you should refer to the microformats wiki spec pages as canonical. I understand. I?d assumed the page was out of date due to the other errors I fixed, and the lack of reply to my comment about timezone validation from February. I?ll email the list in future. Also thank you for the much clearer guidance on wiki/microdata. On Tue, Jul 13, 2010 at 5:13 AM, Martin McEvoy wrote: > microdata does however exist as a separate specification [2] but is just > "attributes" and as far as I know, microdata vCard and vEvent no longer > exists as part of the microdata specification do they?. They?ve been removed due to ?politics?. They?re available via the WHATWG spec as referenced in my email, and now in the wiki/microdata page (thanks Tantek). > I wouldnt really be surprised to see microdata disappear all together(but > that's just my thought) But how could microdata possibly disappear now that Google supports it? ;) Finally thanks for the clarification Philip peace - oli From martin at weborganics.co.uk Tue Jul 13 18:45:59 2010 From: martin at weborganics.co.uk (Martin McEvoy) Date: Tue Jul 13 18:53:44 2010 Subject: [uf-discuss] re: HTML5 support In-Reply-To: References: <1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry> Message-ID: <4C3D16D7.5010408@weborganics.co.uk> Hello Oli ... On 13/07/2010 17:59, Oli Studholme wrote: > On Tue, Jul 13, 2010 at 5:13 AM, Martin McEvoy wrote >> I wouldnt really be surprised to see microdata disappear all together(but >> that's just my thought) > But how could microdata possibly disappear now that Google supports it? ;) Because Microdata is far to obtrusive to be practical in the "real world" for example.... Microdata vcard example from http://www.whatwg.org/specs/web-apps/current-work/multipage/links.html#vcard George Washington 8 lines of code which would parse as: BEGIN:VCARD PROFILE:VCARD VERSION:3.0 SOURCE:document's address FN:George Washington N:Washington;George;;; END:VCARD great you would think, now try that using microformats, example from http://yiid.cc/3GI2 George Washington 3 lines of code which parses as: BEGIN:VCARD SOURCE:document's address NAME:document's title VERSION:3.0 N;CHARSET=UTF-8:Washington;George;;; FN;CHARSET=UTF-8:George Washington END:VCARD from a commercial and practical point of view, microdata is definitely not intended to be for "humans first" . Anyway believe what you like, microdata needs a *lot* of work before it can ever be considered as "micro" as far as I can see, at the moment It just confuses people into using an unnecessary semantic. Best wishes -- Martin McEvoy From microformats.org at boblet.net Tue Jul 13 19:14:37 2010 From: microformats.org at boblet.net (Oli Studholme) Date: Tue Jul 13 19:15:11 2010 Subject: [uf-discuss] re: HTML5 support In-Reply-To: References: <1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry> Message-ID: Hey Tantek, >From IRC: # [20:52] hey Hixie, can you give me more details about the microdata vcard vocab being based on vcard not ?an out-of-date fork of hcard?? # [20:54] i just went down the vcard spec and mapped it directly to microdata # [20:54] i had originally made some minor changes to match hcard in places, but i've since removed those # [20:56] Hixie: was that the fn magic? any other hcard -> vcard reversions? # [20:56] i think the only bit was the stuff with FN, yeah # [20:57] everything else is just a straight mapping of the vcard spec # [20:57] i did use the hcard names for the bits of vcard that needed splitting into multiple fields, but just to make sure the terminology was consistent, it's not "forked from hcard" or anything ? # [20:58] the whole point of microdata is that people can use whatever vocabularies they like; the vcard one is basically a proof of concept to show that it is possible to design a vocabulary in very little time and to show how to write a spec for one http://www.whatwg.org/specs/web-apps/current-work/multipage/links.html#vcard ?The following are the type's defined property names. They are based on the vocabulary defined in the vCard specification and its extensions, where more information on how to interpret the values can be found. [RFC2426] [RFC4770]? Suggested edit on http://microformats.org/wiki/microdata#microdata_vCard_vocabulary Avoid the "microdata vCard vocabulary" as in many ways it is an out-of-date fork/snapshot of hCard, even though portions of it appear to based directly on the vCard RFC. as well. ? Avoid the "microdata vCard vocabulary" as it is based directly on the vCard RFC. (Plus the same for vEvent) Regarding rel-* microformats: # rel-nofollow Microformats: ?By adding rel="nofollow" to a hyperlink, a page indicates that the destination of that hyperlink should not be afforded any additional weight or ranking by user agents which perform link analysis upon web pages (e.g. search engines). Typical use cases include links created by 3rd party commenters on blogs, or links the author wishes to point to, but avoid endorsing.? HTML5: ?The nofollow keyword indicates that the link is not endorsed by the original author or publisher of the page, or that the link to the referenced document was included primarily because of a commercial relationship between people affiliated with the two pages.? # rel-license Microformats: ?By adding rel="license" to a hyperlink, a page indicates that the destination of that hyperlink is a license for the current page.? HTML5: ?The license keyword indicates that the referenced document provides the copyright license terms under which the main content of the current document is provided.? Out of curiosity what are the perceived incompatibilities in these two examples that prevent them from being listed under http://microformats.org/wiki/html5#Current_microformat_compatibility ? peace - oli From martin at weborganics.co.uk Tue Jul 13 19:44:07 2010 From: martin at weborganics.co.uk (Martin McEvoy) Date: Tue Jul 13 19:50:31 2010 Subject: [uf-discuss] re: HTML5 support In-Reply-To: <4C3D16D7.5010408@weborganics.co.uk> References: <1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry> <4C3D16D7.5010408@weborganics.co.uk> Message-ID: <4C3D2477.5020106@weborganics.co.uk> Oli Please don't get me wrong microdata does offer some interesting potential as far as microformats are concerned, It just needs looking at with "new eyes" and in a way that can help microformats *and* be 100% compatible with the way microformats exist now. There are a couple of attributes that could really be useful to microformats, the itemscope attribute because its opaque, and itemref which very similar to the include pattern but better because it would allow an author to reference whole blocks of data not just a single property. example, you could have the following markup somewhere in a page: Alfred Hitchcock and add different parts of a page say in the footer....
1600 Amphitheatre Parkway
Building 43, Second Floor
Mountain View, CA 94043
I don't see any problem in microformats adopting only the parts of microdata that are useful to microformats, there are probably others who will disagree with that though ;-) Best wishes. Martin On 14/07/2010 02:45, Martin McEvoy wrote: > Hello Oli ... > > On 13/07/2010 17:59, Oli Studholme wrote: >> On Tue, Jul 13, 2010 at 5:13 AM, Martin >> McEvoy wrote >>> I wouldnt really be surprised to see microdata disappear all >>> together(but >>> that's just my thought) >> But how could microdata possibly disappear now that Google supports >> it? ;) > > Because Microdata is far to obtrusive to be practical in the "real > world" for example.... > > Microdata vcard example from > http://www.whatwg.org/specs/web-apps/current-work/multipage/links.html#vcard > > > > > George > Washington > > > > > 8 lines of code which would parse as: > > BEGIN:VCARD > PROFILE:VCARD > VERSION:3.0 > SOURCE:document's address > FN:George Washington > N:Washington;George;;; > END:VCARD > > great you would think, now try that using microformats, example from > http://yiid.cc/3GI2 > > > George Washington > > > 3 lines of code which parses as: > > BEGIN:VCARD > SOURCE:document's address > NAME:document's title > VERSION:3.0 > N;CHARSET=UTF-8:Washington;George;;; > FN;CHARSET=UTF-8:George Washington > END:VCARD > > from a commercial and practical point of view, microdata is definitely > not intended to be for "humans first" . > > Anyway believe what you like, microdata needs a *lot* of work before > it can ever be considered as "micro" as far as I can see, at the > moment It just confuses people into using an unnecessary semantic. > > Best wishes > -- Martin McEvoy From microformats.org at boblet.net Tue Jul 13 20:06:07 2010 From: microformats.org at boblet.net (Oli Studholme) Date: Tue Jul 13 20:18:21 2010 Subject: [uf-discuss] re: HTML5 support In-Reply-To: <4C3D16D7.5010408@weborganics.co.uk> References: <1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry> <4C3D16D7.5010408@weborganics.co.uk> Message-ID: Hey Martin, On Wed, Jul 14, 2010 at 10:45 AM, Martin McEvoy wrote: > On 13/07/2010 17:59, Oli Studholme wrote: >> But how could microdata possibly disappear now that Google supports it? ;) > > Because Microdata is far to obtrusive to be practical in the "real world" > for example.... > > Microdata vcard example from > http://www.whatwg.org/specs/web-apps/current-work/multipage/links.html#vcard > > > ? ? ? ? > ? ? ? ? ? ? ? ? > ? ? ? ? ? ? ? ? ? ? ? ?George > ? ? ? ? ? ? ? ? ? ? ? ?Washington > ? ? ? ? ? ? ? ? > ? ? ? ? > This is equivalent to George Washington > from a commercial and practical point of view, microdata is definitely not > intended to be for "humans first" . I think it would be more accurate to say RFC2426 is not intended to be ?humans first? ;-) for better or worse vCard doesn?t contain implied ?n? optimisation. > Anyway believe what you like, microdata needs a *lot* of work before it can > ever be considered as ?"micro" as far as I can see, at the moment It just > confuses people into using an unnecessary semantic. Well, to use a non-English example: ???????? ?? ???????? ?? These seem pretty equivalent to me, with the main difference in length being the itemtype URL. However there are advantages to using URLs for specifying a vocabulary. Keep in mind the implied ?n? optimisation is arguably potentially dangerous e.g. for a social app that only collects the user?s name, rather than two separate fields for given and family names, and then displays this as an hCard. While some languages that have family-name given-name order don?t use a space separator (CJK), a quick look at http://twitter.com/boblet shows one incorrect optimisation for my friend Channy: ????(Channy Yun)?. As you can imagine this doesn?t optimise well. I?d look for more but it seems Twitter?s profile page vcards are completely borked :) I agree that for marking up a person with their name and URL ? if you can use implied ?n? optimisation ? microformats is superfast. However I find I often use hCard for more data than just that, to the extent that writing them without snippets becomes tiring. And if you?re making snippets, there?s little difference. HTH peace - oli PS just saw your reply (I can?t keep up! :) Yeah I?ve definitely wanted an equivalent to itemref for microformats, and hadn?t come across the include pattern before. thanks! > in a way that can help microformats *and* be 100% compatible with the way microformats exist now I don?t think compatibility is so important. Microformats, microdata and RDFa all target the same basic problem space but each has it?s strengths and weaknesses. different ideas help each technology improve (RDFa 1.1 moving towards microformats? simplicity for example). Finally (I perceive) ?F as an elegant hack to graft new semantics onto HTML using the tools available; class, rel, rev, profile and coding patterns. With the changed toolset in HTML5 (including no rev or profile attributes) it makes sense to reassess methods, and I?m looking at microdata and RDFa for that reason. From scott at randomchaos.com Tue Jul 13 21:09:36 2010 From: scott at randomchaos.com (Scott Reynen) Date: Tue Jul 13 21:09:45 2010 Subject: [uf-discuss] re: HTML5 support In-Reply-To: References: <1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry> Message-ID: On Jul 13, 2010, at 8:14 PM, Oli Studholme wrote: > Suggested edit on > http://microformats.org/wiki/microdata#microdata_vCard_vocabulary > Avoid the "microdata vCard vocabulary" as in many ways it is an > out-of-date fork/snapshot of hCard, even though portions of it appear > to based directly on the vCard RFC. as well. > ? > Avoid the "microdata vCard vocabulary" as it is based directly on the vCard RFC. I'd suggest removing the entire vocabulary-specific section altogether. As mentioned in the same page, microdata is aiming to solve a different problem than microformats, so it's misleading to suggest specific vocabularies are actually alternatives to specific microformats by talking about them vis-a-vis microformats. Put another way, that section violates DRY. Because microdata is aiming to solve a different problem, *no* microdata vocabulary could possibly be recommended in place of a specific microformat, so it's redundant to go into the ways in which a specific microdata vocabulary goes against microformat principles, principles it's not even attempting to follow. Peace, Scott From angelo at gladding.name Wed Jul 14 19:30:44 2010 From: angelo at gladding.name (Angelo Gladding) Date: Wed Jul 14 19:30:49 2010 Subject: [uf-discuss] `microformats` and a universal test suite Message-ID: Hello all, I am currently writing a universal parser [1]. It goes by the name `microformats` because I intend it to be as close as possible to a canonical codification of all things Microformats. This will be accomplished by codifying each specification in a Python module using what can best be described as a domain-specific language. See the `adr` definition [2] and accompanying tests [3]. Each definition file will contain as much spec-related information as possible. Each test suite will provide a series of HTML/ufJSON equivalents. A web interface (currently functional, but unreliable as I develop) acts as a web service for transformation and validation but also to summarize the current state of Microformats down to author tables and overall analysis of the lexicon. The code currently in the repository is a reduction of the current state of the project. I have defined 33 formats, ranging from proposal to spec, in the definition format. Additionally I have a buggy analysis module that renders DOT graphs of the entire lexicon [4] and subsections thereof. The scope is a bit wide but most is already written and output is finally beginning to look robust -- which leads me to my main point: - - - Is anyone interested in helping with the compilation of a universal test suite? I'd like to bring this up sooner than later as it is the one aspect of my project that requires community participation for it to be truly effective. In particular, I'd like to grab the ear of Toby Inkster and Mike Kaply and collaborate to standardize the results of Swignition, Operator, and `microformats`. The ultimate goal of the test suite is multi-part: - to have a concrete set of tests that will allow future implementors to be able to implement with confidence; - to have a common format for specification authors to be able to codify their designs; - and to provide a plethora of examples for content creators including *all* possible edge cases of all formats and patterns. I have had little luck pursuing my ventures via the wiki due to its rather ironic incapacity to implement microformats. The reasoning is understandable, though, so I suggest that we just keep this simple and rally around good old DVCS. I am aware of http://hg.microformats.org/ and am not opposed to forming a shared subrepo for the suite releasing all tests under a CC0 in the process. - - - There are other aspects of the project that I'd like to involve the community in as well, such as automated XMDP inferencing, semantic graphing (graphing the semantic web as opposed to graphing the Microformat lexicon), and consolidation of properties (which becomes more apparent once you stare at a webpage presenting a spec's profile, its graph, and property/subproperty derivation/relatives. These, however, are considerably less important than testing and conformity at the moment. Looking forward to hearing from anyone interested. [1]: https://bitbucket.org/angelo/microformats/ [2]: https://bitbucket.org/angelo/microformats/src/5f8dbe75b683/microformats/lexicon/adr.py [3]: https://bitbucket.org/angelo/microformats/src/5f8dbe75b683/tests/adr/ [4]: http://imgur.com/5dpq7.jpg -- Angelo Gladding angelo@gladding.name From scott at randomchaos.com Wed Jul 14 20:37:44 2010 From: scott at randomchaos.com (Scott Reynen) Date: Wed Jul 14 20:37:51 2010 Subject: [uf-discuss] `microformats` and a universal test suite In-Reply-To: References: Message-ID: <1E3B98EC-FB68-48B8-8511-75F63831962A@randomchaos.com> On Jul 14, 2010, at 8:30 PM, Angelo Gladding wrote: > I am currently writing a universal parser [1]. Hi Angelo, Sounds like an ambitious project and I'd like to have more to contribute, but all I have now is a suggestion to move this discussion to the microformats-dev list, which is focused on exactly this kind of topic: http://microformats.org/mailman/listinfo/microformats-dev/ Peace, Scott From angelo at gladding.name Wed Jul 14 21:08:37 2010 From: angelo at gladding.name (Angelo Gladding) Date: Wed Jul 14 21:16:11 2010 Subject: [uf-discuss] `microformats` and a universal test suite In-Reply-To: <1E3B98EC-FB68-48B8-8511-75F63831962A@randomchaos.com> References: <1E3B98EC-FB68-48B8-8511-75F63831962A@randomchaos.com> Message-ID: On Wed, Jul 14, 2010 at 8:37 PM, Scott Reynen wrote: > On Jul 14, 2010, at 8:30 PM, Angelo Gladding wrote: > >> I am currently writing a universal parser [1]. > > Hi Angelo, > > Sounds like an ambitious project and I'd like to have more to contribute, but all I have now is a suggestion to move this discussion to the microformats-dev list, which is focused on exactly this kind of topic: > > http://microformats.org/mailman/listinfo/microformats-dev/ > > Peace, > Scott > > > _______________________________________________ > microformats-discuss mailing list > microformats-discuss@microformats.org > http://microformats.org/mailman/listinfo/microformats-discuss > I wasn't sure -- thought that might be more for development of specifications. Will cross-post, thanks. -- Angelo Gladding angelo@gladding.name From microformats.org at boblet.net Sun Jul 18 05:38:31 2010 From: microformats.org at boblet.net (Oli Studholme) Date: Sun Jul 18 05:39:11 2010 Subject: [uf-discuss] re: HTML5 support In-Reply-To: References: <1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry> Message-ID: Hey All, re: Martin?s earlier email On Wed, Jul 14, 2010 at 10:45 AM, Martin McEvoy wrote: > > George Washington > I think the issue you had with the microdata equivalent was brevity/simplicity, correct? While the ?n? class optimisation isn?t in the microdata vocabulary, and I?ve already covered how for non-Western style names this doesn?t apply (and is potentially harmful), I forgot about the profile attribute: http://microformats.org/wiki/hcard#Profile The difference is in microdata a profile (vocabulary) link is required via @itemtype, whereas it?s a ?_should_? in microformats. If we add a profile to my previous non-English example results in a draw for me in the simplicity stakes: ? ???????? ?? ???????? ?? Of course if you can use implied ?n? optimisation microformats are definitely simpler, but the difference is less pronounced when using @profile: ? Oli Studholme Oli Studholme Of course, no one actually uses @profile with microformats, so it?s probably a moot point :D Finally thank you for pointing out the nested fn and n itemprops in the spec example which should be in the same itemprop. I filed a bug: http://www.w3.org/Bugs/Public/show_bug.cgi?id=10159 On Wed, Jul 14, 2010 at 1:09 PM, Scott Reynen wrote: > I'd suggest removing the entire vocabulary-specific section altogether. As mentioned in the same page, microdata is aiming to solve a different problem than microformats, so it's misleading to suggest specific vocabularies are actually alternatives to specific microformats by talking about them vis-a-vis microformats. I?m sorry, but what text are you referring to? What I see is: ?microdata is an extension to HTML5 that provides another way to embed microformats and poshformats vocabularies? > Put another way, that section violates DRY. Because microdata is aiming to solve a different problem, *no* microdata vocabulary could possibly be recommended in place of a specific microformat, so it's redundant to go into the ways in which a specific microdata vocabulary goes against microformat principles, principles it's not even attempting to follow. Out of curiosity what do you perceive are the different problems that microformats and microdata are trying to solve? I personally see microformats as a grass-roots movement that uses the tools available to extend HTML with extra semantics. Currently this is accomplished using @class, @rel etc. I see microdata as a new tool in HTML5 that would also be suitable for using with microformats, so I?m wondering what?s up with all the negativity directed toward microdata in these replies. @Tantek: It seems the current inclusion of vcard and vevent vocabularies in the HTML5 spec is something of a problem (at least based on the IMO incorrect comments in the wiki I?ve pointed out above), so I wonder how is progress going on the 1.0.1 versions that Hixie said he?d be happy to link to as normative versions? Ref: http://krijnhoetmer.nl/irc-logs/whatwg/20090717#l-335 According to Hixie the vcard/vevent vocabularies are in the spec as examples of how to write a microdata vocabulary, so could presumably be changed with something else (?the vcard one is basically a proof of concept to show that it is possible to design a vocabulary in very little time and to show how to write a spec for one?) ref: http://krijnhoetmer.nl/irc-logs/whatwg/20100713#l-884 Finally, I wonder how I can assist in the documentation of how to use any microformat via microdata? ref: http://krijnhoetmer.nl/irc-logs/whatwg/20090717#l-437 # [10:36] tantek: will current Microformats be released in Microdata format at some stage? # [10:37] boblet - doubtful. but will likely happen is that microformats.org will document how to use *any* microformat generically using microdata syntax. watch this page for updates: http://microformats.org/wiki/html5 peace - oli @boblet From scott at randomchaos.com Sun Jul 18 09:10:37 2010 From: scott at randomchaos.com (Scott Reynen) Date: Sun Jul 18 09:10:43 2010 Subject: [uf-discuss] re: HTML5 support In-Reply-To: References: <1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry> Message-ID: <3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com> On Jul 18, 2010, at 6:38 AM, Oli Studholme wrote: >> I'd suggest removing the entire vocabulary-specific section altogether. As mentioned in the same page, microdata is aiming to solve a different problem than microformats, so it's misleading to suggest specific vocabularies are actually alternatives to specific microformats by talking about them vis-a-vis microformats. > > I?m sorry, but what text are you referring to? This is what I'm referring to as the "vocabulary-specific section": http://microformats.org/wiki/microdata#microdata_vocabularies This is what I'm referring to as "mentioned in the same page, microdata is aiming to solve a different problem": http://microformats.org/wiki/microdata#potential > Out of curiosity what do you perceive are the different problems that > microformats and microdata are trying to solve? Microformats aim to "solve a specific problem." Microdata aims to be compatible with RDF, which demands more generic semantics. Because of this, I doubt you'll ever see something like n optimization in microdata. You've suggested that's a good thing because n optimization doesn't make sense in all cases, but that's the crux of it: microformats aren't trying to make sense in all cases, while microdata is. n optimization isn't a good thing or a bad thing; it's simply a reflection of different goals. > I personally see microformats as a grass-roots movement that uses the > tools available to extend HTML with extra semantics. Currently this is > accomplished using @class, @rel etc. I see microdata as a new tool in > HTML5 that would also be suitable for using with microformats, so I?m > wondering what?s up with all the negativity directed toward microdata > in these replies. Maybe you could clarify what specifically you see as negativity toward microdata? I don't see microdata and microformats having different goals as a bad thing for either. Different goals are good. Peace, Scott From microformats.org at boblet.net Sun Jul 18 21:30:45 2010 From: microformats.org at boblet.net (Oli Studholme) Date: Sun Jul 18 21:31:17 2010 Subject: [uf-discuss] re: HTML5 support In-Reply-To: References: <1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry> <4C3D16D7.5010408@weborganics.co.uk> Message-ID: Hey Martin, On Wed, Jul 14, 2010 at 12:06 PM, Oli Studholme wrote: >> Microdata vcard example from >> http://www.whatwg.org/specs/web-apps/current-work/multipage/links.html#vcard >> >> >> ? ? ? ? >> ? ? ? ? ? ? ? ? >> ? ? ? ? ? ? ? ? ? ? ? ?George >> ? ? ? ? ? ? ? ? ? ? ? ?Washington >> ? ? ? ? ? ? ? ? >> ? ? ? ? >> > > This is equivalent to > > > ? ? ? > ? ? ? ? ? ? ?George > ? ? ? ? ? ? ?Washington > ? ? ? > I?m sorry but I misunderstood/misread the microdata vcard spec (I didn?t realise that n was a nested item), and my example is wrong. It should be longer not shorter :) ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?George ? ? ? ? ? ? ? ? ? ? ? ?Washington ? ? ? ? ? ? ? ? ? ? ? ? So for a non-Western name one extra wrapper element, but for a name with n optimisation three extra wrapper elements. peace - oli From microformats.org at boblet.net Sun Jul 18 22:03:40 2010 From: microformats.org at boblet.net (Oli Studholme) Date: Sun Jul 18 22:04:08 2010 Subject: [uf-discuss] re: HTML5 support In-Reply-To: <3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com> References: <1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry> <3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com> Message-ID: Hey Scott, thanks for your reply. On Mon, Jul 19, 2010 at 1:10 AM, Scott Reynen wrote: > Microformats aim to "solve a specific problem." ?Microdata aims to be compatible with RDF, which demands more generic semantics. ?Because of this, I doubt you'll ever see something like n optimization in microdata. ?You've suggested that's a good thing because n optimization doesn't make sense in all cases, but that's the crux of it: microformats aren't trying to make sense in all cases, while microdata is. ?n optimization isn't a good thing or a bad thing; it's simply a reflection of different goals. I disagree. The purpose of microdata is to ?annotate content with specific machine-readable labels, e.g. to allow generic scripts to provide services that are customised to the page?. This is also a pretty good description of how @class is used in microformats, and I think that?s a good metaphor. I think you should be comparing microformats with microdata *vocabularies*, which also aim to solve a specific problem. Microdata is just a method by which to do this. While it?s possible to convert microdata into RDFa (along with JSON and Atom), compatibility with RDF is not the aim of microdata ? if anything it seems to be ?provide a simple mechanism to semantically extend HTML5 to keep ppl who think this is important happy? :) The n optimisation was actually in the microdata vcard spec, but Hixie removed it after deciding it was ?magic?. While I can understand the reasons, I think it?d be less confusing/easier if the vcard vocabulary either removed all reference to hcard (e.g. used a non-microformats.org itemtype URL), or mapped hCard exactly. I?m hoping that once hCard 1.0.1 is finished one or both of these things might happen. As for using microdata, if you?re using simple microformats (just fn+url hcards for example) maybe it is too wordy a method. But personally I generally can?t use that optimisation (for example: http://www.cie.mie-u.ac.jp/en/tri-u/2006/committee.html ), so I?m interested in microdata vocabularies for microformats, or the generic way of representing microformats in microdata that Tantek mentioned a year ago. > Maybe you could clarify what specifically you see as negativity toward microdata? maybe I?m just reading too much into it after talking about microformats and microdata with RDF ppl :D peace - oli From philipj at opera.com Mon Jul 19 01:31:32 2010 From: philipj at opera.com (=?utf-8?Q?Philip_J=C3=A4genstedt?=) Date: Mon Jul 19 01:31:44 2010 Subject: [uf-discuss] re: HTML5 support In-Reply-To: <3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com> References: <1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry> <3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com> Message-ID: On Sun, 18 Jul 2010 18:10:37 +0200, Scott Reynen wrote: > On Jul 18, 2010, at 6:38 AM, Oli Studholme wrote: > >> Out of curiosity what do you perceive are the different problems that >> microformats and microdata are trying to solve? > > Microformats aim to "solve a specific problem." Microdata aims to be > compatible with RDF, which demands more generic semantics. Microdata doesn't go out of its way to be compatible with existing RDF vocabularies, in fact I'd argue that the RDF extraction algorithm creates some pretty ugly URIs that anyone who actually likes RDF would frown upon and not want to use. In any event there's very little "RDFness" over the syntax itself, the model is key-values, not triples. > Because of this, I doubt you'll ever see something like n optimization > in microdata. You've suggested that's a good thing because n > optimization doesn't make sense in all cases, but that's the crux of it: > microformats aren't trying to make sense in all cases, while microdata > is. n optimization isn't a good thing or a bad thing; it's simply a > reflection of different goals. This isn't a difference between microformats and microdata. The microdata vocabulary *had* the 'n' optimization, but it was removed after I showed that it didn't work for e.g. Chinese or Vietnamese. I tried to learn from this community why it isn't a bad idea, but there wasn't much useful feedback. It really should be removed from microformats too, but that's probably too late. -- Philip J?genstedt Core Developer Opera Software From scott at randomchaos.com Mon Jul 19 17:34:05 2010 From: scott at randomchaos.com (Scott Reynen) Date: Mon Jul 19 17:34:17 2010 Subject: [uf-discuss] re: HTML5 support In-Reply-To: References: <1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry> <3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com> Message-ID: <5DC1788B-59A0-45AE-8E88-4CF257DA642C@randomchaos.com> On Jul 19, 2010, at 2:31 AM, Philip J?genstedt wrote: >>> Out of curiosity what do you perceive are the different problems that >>> microformats and microdata are trying to solve? >> >> Microformats aim to "solve a specific problem." Microdata aims to be compatible with RDF, which demands more generic semantics. > > Microdata doesn't go out of its way to be compatible with existing RDF vocabularies Maybe not specific vocabularies (that's kind of my point), but RDF itself is clearly a major consideration. There's a whole section on it: http://www.w3.org/TR/microdata/#rdf > In any event there's very little "RDFness" over the syntax itself, the model is key-values, not triples. It may not translate *well* to RDF, but I disagree that such translation isn't a goal. The syntax isn't particularly important, though. RDF is simply my sloppy shorthand for general purpose semantics. Microformats, unlike both RDF and microdata, are explicitly not intended to be general purpose. The microdata spec itself doesn't even mention specific vocabularies, whereas microformats are nothing *but* specific vocabularies. It's no surprise that general purpose formats like microdata don't express specific vocabularies as succinctly as microformats. It's also no surprise that microformats don't cover as much variety of data as general purpose formats. >> Because of this, I doubt you'll ever see something like n optimization in microdata. > > This isn't a difference between microformats and microdata. The microdata vocabulary *had* the 'n' optimization, but it was removed after I showed that it didn't work for e.g. Chinese or Vietnamese. Well, so much for that prediction. Still, the removal suggests to me that it *is* a significant difference: > I tried to learn from this community why it isn't a bad idea, but there wasn't much useful feedback. I'd argue it is a bad idea in microdata, but not in microformats, because of the very distinction I'm trying to draw between the two. n optimization isn't required. It's a handy shorthand in some specific cases, but shouldn't be used universally, as it does't make sense everywhere. hCard can handle Chinese names just fine with explicit given-name and family-name properties. Nothing about n optimization makes this more difficult; n optimization only makes specific cases easier. Making specific cases easier is the whole point of microformats, but it's not at all the point of microdata. Peace, Scott From microformats.org at boblet.net Mon Jul 19 19:57:34 2010 From: microformats.org at boblet.net (Oli Studholme) Date: Mon Jul 19 20:03:26 2010 Subject: [uf-discuss] re: HTML5 support In-Reply-To: <5DC1788B-59A0-45AE-8E88-4CF257DA642C@randomchaos.com> References: <1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry> <3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com> <5DC1788B-59A0-45AE-8E88-4CF257DA642C@randomchaos.com> Message-ID: Hey Scott, On Tue, Jul 20, 2010 at 9:34 AM, Scott Reynen wrote: >> Microdata doesn't go out of its way to be compatible with existing RDF vocabularies > > Maybe not specific vocabularies (that's kind of my point), but RDF itself is clearly a major consideration. ?There's a whole section on it: > > http://www.w3.org/TR/microdata/#rdf No. There?s a sub-sub-section on converting to RDF, just as there are for converting to JSON and Atom. That?s not a design goal, it?s specified interoperability. There are also sub-sub-sections on vcard, vevent and licensing vocabularies, so by the same logic these are also major considerations (again no, they?re sample vocabularies). > It's no surprise that general purpose formats like microdata don't express specific vocabularies as succinctly as microformats. You?re not doing a lot of hCalendar formats I take it? ;-) > I'd argue it is a bad idea in microdata, but not in microformats, because of the very distinction I'm trying to draw between the two. As far as microdata goes it?s irrelevant ? that?s something decided by the *vocabulary* author. Adding it isn?t a bad idea if the vocabulary author thinks the shortcut has more good than bad points. > Making specific cases easier is the whole point of microformats, but it's not at all the point of microdata. ?Making specific cases easier is the whole point of the class attribute, but it's not at all the point of microdata? Microdata ? and semantic class names plus posh coding patterns for current microformats ? are the method; a means to an end. Microdata vocabularies use microdata to express semantics, just as microformats use the class attribute etc to express semantics. Microformats are a little more concise in general (cough, datetimes ;-) compared to the same vocabulary in microdata (@class is shorter than @itemprop by 4 characters, @property is optional whereas @itemtype is required etc), but the differences are not so great, and any class-based microformat can be written using microdata. peace - oli PS @Philip the reasons for n optimisation are as in the wiki; a combination of putting authors first (shortcut for western-style ?given-name family-name? names), and accommodating mistakes in the original RFC. I guess there was the expectation that hCard would mainly be used with western-style names, a lack of knowledge of Vietnamese, Chinese and other names that would be incorrectly classified by this optimisation, and/or this shortcut was valued above i18n issues (it was made back in 2005 after all). I?d originally thought of it as just an edge case in Japanese, but reading about Vietnamese, Chinese and Korean names I?m starting to feel this is a serious i18n issue. I wonder what Tantek?s view, and the view of whoever else is working on hCard 1.0.1, is. I wonder if it will be perceived to be as serious as the a11y issues the abbr time pattern had? Aah just found http://microformats.org/wiki/hcard-issues-resolved#fn-opt-i18n and it seems not. I guess there?s the assumption that east asian pages specify their language, which seems somewhat disconnected from reality :/ From martin at weborganics.co.uk Mon Jul 19 22:41:59 2010 From: martin at weborganics.co.uk (Martin McEvoy) Date: Mon Jul 19 22:42:11 2010 Subject: [uf-discuss] re: HTML5 support In-Reply-To: References: <1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry> <3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com> <5DC1788B-59A0-45AE-8E88-4CF257DA642C@randomchaos.com> Message-ID: <4C453727.1060704@weborganics.co.uk> On 20/07/2010 03:57, Oli Studholme wrote: > Hey Scott, > > On Tue, Jul 20, 2010 at 9:34 AM, Scott Reynen wrote: > >> Making specific cases easier is the whole point of microformats, but it's not at all the point of microdata. > ?Making specific cases easier is the whole point of the class > attribute, but it's not at all the point of microdata? > > Microdata ? and semantic class names plus posh coding patterns for > current microformats ? are the method; a means to an end. Microdata > vocabularies use microdata to express semantics, just as microformats > use the class attribute etc to express semantics. Microformats are a > little more concise in general (cough, datetimes ;-) compared to the > same vocabulary in microdata (@class is shorter than @itemprop by 4 > characters, @property is optional whereas @itemtype is required etc), > but the differences are not so great, and any class-based microformat > can be written using microdata. Im sorry but you cannot express *microformats* in microdata if you do, its cute, but It isn't a microformat because microformats *only* use class names, and a few choice rel-values. If you move a microformat away from @class its no longer a microformat and shouldn't be described as such (we are a bit fussy about that :P). This is why when someone starts talking about a "new microformats" or "microformats done better" the first thing I ask myself is "does it use semantic class names?" ... no well its not a new microformat or microformats done better. Well the *good* news is HTML5 already supports microformats without adding any attributes at all (Yay!) .... that is until someone marks @class as obsolete!! ... joke. Best wishes. -- Martin McEvoy From angelo at gladding.name Mon Jul 19 21:05:06 2010 From: angelo at gladding.name (Angelo Gladding) Date: Mon Jul 19 22:51:54 2010 Subject: [uf-discuss] re: HTML5 support In-Reply-To: References: <1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry> <3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com> <5DC1788B-59A0-45AE-8E88-4CF257DA642C@randomchaos.com> Message-ID: Could it be said that microdata intends to do to Microformat syntax what HTML5 did to HTML4 syntax rules in the sense that parsing is unambiguous and easier to validate normativity? Can an enlightened soul describe in which ways microdata is actually superior to profiled poshformats? - - - Might a "humans first, machines second" CJKV internationalization of `n` optimization be to analyze the contents of the `fn`'s @lang and inner text and use either or both to better determine name order? e.g. Angelo Gladding { "hCard": [ { "hcard": { "fn": "Angelo Gladding", "n": { "n": { "family-name": [ "Gladding" ], "given-name": [ "Angelo" ] } } } } ] } where ????? == anjero (Angelo) ??????? == guraddingu (Gladding) ????????????? ????????????? ????????????? { "hCard": [ { "hcard": { "fn": "\u30b0\u30e9\u30c3\u30c7\u30a3\u30f3\u30b0\u3000\u30a2\u30f3\u30b8\u30a7\u30ed", "n": { "n": { "family-name": [ "\u30b0\u30e9\u30c3\u30c7\u30a3\u30f3\u30b0" ], "given-name": [ "\u30a2\u30f3\u30b8\u30a7\u30ed" ] } } } } ] } i.e. Splitting on \u3000 (CJKV space), perform `n` optimization in reverse when the `fn` element/ancestor matches @lang(zh|ja|ko|vi) or the first character of the text content lies in one of the following Unicode character ranges: U+4E00?U+9FBF (Kanji) U+3040?U+309F (Hiragana) U+30A0?U+30FF (Katakana) http://en.wikipedia.org/wiki/Japanese_writing_system ... Chinese ... Korean ... Vietnamese ... *i18n expert needed* While this requires what I believe to be an uncommon usage of a space delimeter among CJK names it could be an easy hack for a user of Site X, assuming Site X does not explicitly define `n` properties, to implement upon failed validation without necessitating code modification on Site X's end. -- Angelo Gladding angelo@gladding.name From philipj at opera.com Tue Jul 20 03:25:03 2010 From: philipj at opera.com (=?utf-8?Q?Philip_J=C3=A4genstedt?=) Date: Tue Jul 20 03:38:36 2010 Subject: [uf-discuss] re: HTML5 support In-Reply-To: References: <1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry> <3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com> <5DC1788B-59A0-45AE-8E88-4CF257DA642C@randomchaos.com> Message-ID: On Tue, 20 Jul 2010 06:05:06 +0200, Angelo Gladding wrote: > Could it be said that microdata intends to do to Microformat syntax > what HTML5 did to HTML4 syntax rules in the sense that parsing is > unambiguous and easier to validate normativity? Yes, more or less. Of course vocabulary-specific rules can only be checked by a specialized validator, but checking the actual structure (key-value pairs) is something you get "for free". Also, I expect automatic validation of date-formats would be appreciated. > Can an enlightened soul describe in which ways microdata is actually > superior to profiled poshformats? Microdata should be compared to the class attributes and the various patterns that microformats use, not any specific vocabulary. The main benefit is that parsing becomes well-defined and simple. That's why it's possible to define a JavaScript API for accessing microdata items on a page, which makes the data useful to the page itself, not only external scrapers. It also makes it feasible to make browser features like "add to address book" or "add to calendar", which really isn't really practical with microformats when the data is hidden in class attributes together with everything else. > Might a "humans first, machines second" CJKV internationalization of > `n` optimization be to analyze the contents of the `fn`'s @lang and > inner text and use either or both to better determine name order? The main problem with this is that due to lazy copy-pasting, lang="en" is often used even when the language isn't English. Also, in the case of e.g. Facebook, lang="en" would be correct for the page itself, but people's names aren't in English anyway. The only way to get it right is to ask the user both for the full name, given name and family name, something I haven't ever seen. The most practical solution is to not guess at all, and I don't know of any negative effects of this. -- Philip J?genstedt Core Developer Opera Software From mail at ciaranmcnulty.com Tue Jul 20 04:05:56 2010 From: mail at ciaranmcnulty.com (Ciaran McNulty) Date: Tue Jul 20 04:06:03 2010 Subject: [uf-discuss] re: HTML5 support In-Reply-To: References: <1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry> <3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com> <5DC1788B-59A0-45AE-8E88-4CF257DA642C@randomchaos.com> Message-ID: On Tue, Jul 20, 2010 at 5:05 AM, Angelo Gladding wrote: > Can an enlightened soul describe in which ways microdata is actually > superior to profiled poshformats? To me it's not a question of Microdata vs POSH, it's more like Microdata vs class attributes where both are methods that can be used in POSH style data embedding. The main arguments (and I present these without necessarily agreeing!) seem to be: 1. Class is ingrained as a CSS hook mechanism. Most people on this list are fine with class being used for other purposes, but despite that the argument comes up incredibly often that using class is somehow a 'hack'. Microdata overcomes that, so right or wrong, it may be worth ditching class for embedded data just to help uptake. 2. The class space is already populated with lots of ill-thought-out CSS identifiers. This means POSH formats have to attempt crude forms of namespacing (e.g. picking a uniquely-named root element) to try and not collide with existing markup. That works for @class="fn" say, but it's easy to collide with @class="email". Microdata separates out the important stuff. 3. Related to 2, microdata extraction is possible without having to be profile-aware, so for instance microdata can be converted to JSON without knowledge of the vocabulary used. 4. Microdata features some structures like @itemref that help combine disparate data across a document into one Microdata element, which in Microformats would need the slightly hacky rel-include structures that frankly I don't think anyone has been completely happy with. 5. Microdata allows locally-scoped typing using the @itemtype property and a URL, while a POSH format can only do something similar with a document-level @profile. 6. Microdata defines an API for DOM access to Microdata that allows scripts to deal with Microdata-embedded data when doing the same with Microformats involves some fairly heavy DOM parsing. The arguments against Microdata are basically that it's complex, huge, obviously isn't based on any existent markup in the wild, and really doesn't look like an obvious core element of HTML5 so it's weird that it's included in the same spec. -Ciaran From philipj at opera.com Tue Jul 20 05:07:49 2010 From: philipj at opera.com (=?iso-8859-15?Q?Philip_J=E4genstedt?=) Date: Tue Jul 20 05:07:57 2010 Subject: [uf-discuss] re: HTML5 support In-Reply-To: References: <1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry> <3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com> <5DC1788B-59A0-45AE-8E88-4CF257DA642C@randomchaos.com> Message-ID: On Tue, 20 Jul 2010 13:05:56 +0200, Ciaran McNulty wrote: > On Tue, Jul 20, 2010 at 5:05 AM, Angelo Gladding > wrote: >> Can an enlightened soul describe in which ways microdata is actually >> superior to profiled poshformats? > > To me it's not a question of Microdata vs POSH, it's more like > Microdata vs class attributes where both are methods that can be used > in POSH style data embedding. > > The main arguments (and I present these without necessarily agreeing!) > seem to be: > > 1. Class is ingrained as a CSS hook mechanism. Most people on this > list are fine with class being used for other purposes, but despite > that the argument comes up incredibly often that using class is > somehow a 'hack'. Microdata overcomes that, so right or wrong, it may > be worth ditching class for embedded data just to help uptake. > > 2. The class space is already populated with lots of ill-thought-out > CSS identifiers. This means POSH formats have to attempt crude forms > of namespacing (e.g. picking a uniquely-named root element) to try and > not collide with existing markup. That works for @class="fn" say, but > it's easy to collide with @class="email". Microdata separates out the > important stuff. > > 3. Related to 2, microdata extraction is possible without having to be > profile-aware, so for instance microdata can be converted to JSON > without knowledge of the vocabulary used. > > 4. Microdata features some structures like @itemref that help combine > disparate data across a document into one Microdata element, which in > Microformats would need the slightly hacky rel-include structures that > frankly I don't think anyone has been completely happy with. > > 5. Microdata allows locally-scoped typing using the @itemtype property > and a URL, while a POSH format can only do something similar with a > document-level @profile. > > 6. Microdata defines an API for DOM access to Microdata that allows > scripts to deal with Microdata-embedded data when doing the same with > Microformats involves some fairly heavy DOM parsing. Well written. Unlike yourself, I agree with all of the above :) > The arguments against Microdata are basically that it's complex, huge, > obviously isn't based on any existent markup in the wild, and really > doesn't look like an obvious core element of HTML5 so it's weird that > it's included in the same spec. Well, it's not in W3C's version of HTML5, they published it as a separate spec (which is strange, IMO). Regardless of what spec it is in, it still works just the same, so that's OK. -- Philip J?genstedt Core Developer Opera Software From mail at ciaranmcnulty.com Tue Jul 20 05:57:09 2010 From: mail at ciaranmcnulty.com (Ciaran McNulty) Date: Tue Jul 20 11:35:49 2010 Subject: [uf-discuss] re: HTML5 support In-Reply-To: References: <1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry> <3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com> <5DC1788B-59A0-45AE-8E88-4CF257DA642C@randomchaos.com> Message-ID: On Tue, Jul 20, 2010 at 1:07 PM, Philip J?genstedt wrote: > Well, it's not in W3C's version of HTML5, they published it as a separate > spec (which is strange, IMO). Regardless of what spec it is in, it still > works just the same, so that's OK. Oh, really? Sorry, I'm out of date in that case. I think it's bundled together with 'HTML5' in the public consciousness anyhow. -Ciaran From angelo at gladding.name Tue Jul 20 12:55:38 2010 From: angelo at gladding.name (Angelo Gladding) Date: Tue Jul 20 12:55:51 2010 Subject: [uf-discuss] re: HTML5 support In-Reply-To: References: <1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry> <3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com> <5DC1788B-59A0-45AE-8E88-4CF257DA642C@randomchaos.com> Message-ID: On Tue, Jul 20, 2010 at 3:25 AM, Philip J?genstedt wrote: > On Tue, 20 Jul 2010 06:05:06 +0200, Angelo Gladding > wrote: > >> Can an enlightened soul describe in which ways microdata is actually >> superior to profiled poshformats? > > Microdata should be compared to the class attributes and the various > patterns that microformats use, not any specific vocabulary. Of course. Let me clarify. A `microformat` is a poshformat that has undergone a relatively laborious process of research and brainstorming to capture real world user requirements to make a minimal vocabulary that can capture ~80% of current usage patterns. Microdata is a set of rules governing a syntax. Hence my comparison of microdata to poshformats, which are essentially microformats sans the due diligence. > The main benefit is that parsing becomes well-defined Ain't that the truth. > and simple. Or is it? I wonder how different the two sets of supporting algorithms might look face to face once fully documented and implemented. The Microformats wiki makes the following comparison to microdata: 1. `itemprop` - is a more specific version of class, for field names. 2. `subject` - allows semantically linking within the page. Conceptually similar to the include-pattern. 3. `itemref` - allows including properties elsewhere on the page that are not descendants of itemscope. Takes space-separated ids (for example itemref="address phone" would include the elements with id="address" and id="phone"). Conceptually similar to the include-pattern. 4. `content` - on the meta element can be used to include invisible data that is not part of the content. As current browsers move meta inside , make sure to include via `itemref`. Conceptually similar to the 'value-title' feature of the value-class-pattern. 5. `itemscope` - identifies blocks to be marked as structured data. Conceptually similar to the mfo brainstorming. 6. `itemtype` - to specify the type for an item (for example: itemtype="http://microformats.org/profile/hcard"). Distilled down: 1. @class 2/3. include-pattern/table-header-pattern 4. value-class-pattern 5. "mfo" 6. rel-profile Sounds to me like the same sort of desire for absolute normativity that [non-HTML5] XHTML once attempted to burden the entirety of humanity with. Ironically, HTML5 has deprecated such a style in favor of a seemingly more flexible Microformat-esque syntax. - - - George Washington vs George Washington - - - example

example

vs example

example > That's why it's possible to define a JavaScript API for accessing microdata > items on a page, which makes the data useful to the page itself, not only > external scrapers. It also makes it feasible to make browser features like "add to > address book" or "add to calendar", Considering your affiliation with Opera, what might I ask are your feelings about Operator? > which really isn't really practical with microformats when the > data is hidden in class attributes together with everything else. As I alluded to above I see this as a complete non-issue yet you are most certainly not the first to bring it up. What am I missing? >> Might a "humans first, machines second" CJKV internationalization of >> `n` optimization be to analyze the contents of the `fn`'s @lang and >> inner text and use either or both to better determine name order? > > The main problem with this is that due to lazy copy-pasting, lang="en" is > often used even when the language isn't English. Also, in the case of e.g. > Facebook, lang="en" would be correct for the page itself, but people's names > aren't in English anyway. Check out http://ja-jp.facebook.com/people/gong-ye-zhong/100000456401743 ...

...???...
?? can log in today and, without any cooperation from Facebook, append a U+200B (zero-width space [1]) to his first name (regardless of the input taking the form of one or two boxes), and immediately reap the benefits of such an `n` optimization without negatively affecting UI, sort order, etc. [1] http://en.wikipedia.org/wiki/Zero-width_space > The only way to get it right is to ask the user both for the full name, > given name and family name, something I haven't ever seen. If you haven't seen it, then it isn't even a single way to get it right -- another byproduct of Microformats philosophy I believe. However, if optimizations can yield 80%+ positive results when viewed in aggregate I personally give a little bit of magic a big thumbs up. > The most practical solution is to not guess at all, and I don't know > of any negative effects of this. I just see a tiny hint of dehumanization. ;) -- Angelo Gladding angelo@gladding.name From scott at randomchaos.com Tue Jul 20 06:47:19 2010 From: scott at randomchaos.com (Scott Reynen) Date: Tue Jul 20 13:33:13 2010 Subject: [uf-discuss] re: HTML5 support In-Reply-To: References: <1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry> <3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com> <5DC1788B-59A0-45AE-8E88-4CF257DA642C@randomchaos.com> Message-ID: <35CD8166-D92D-4030-895A-141131120C4E@randomchaos.com> On Jul 19, 2010, at 8:57 PM, Oli Studholme wrote: >>> Microdata doesn't go out of its way to be compatible with existing RDF vocabularies >> >> Maybe not specific vocabularies (that's kind of my point), but RDF itself is clearly a major consideration. There's a whole section on it: >> >> http://www.w3.org/TR/microdata/#rdf > > No. There?s a sub-sub-section on converting to RDF, just as there are > for converting to JSON and Atom. That?s not a design goal, it?s > specified interoperability. They're mentioned as "requirements" in the original announcement: http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-May/019681.html But again, the RDF syntax doesn't matter. This is the important part for me: "Distributed vocabulary development should be possible; it should not require coordination through a centralised system." Distributed vocabulary development requires a general purpose solution. Microformats don't have that requirement, so vocabulary-specific solutions are common. >> I'd argue it is a bad idea in microdata, but not in microformats, because of the very distinction I'm trying to draw between the two. > > As far as microdata goes it?s irrelevant ? that?s something decided by > the *vocabulary* author. I don't think that's really true, though, and I think this is exactly why n optimization was removed. For every other microdata property, the value is determined by following the parsing rules in the microdata spec: http://www.w3.org/TR/microdata/#values With n optimization, undeclared properties are given values via a completely different parsing model. This "magic" may not be explicitly disallowed, but it doesn't really fit with the general design of microdata. On Jul 19, 2010, at 10:05 PM, Angelo Gladding wrote: > Could it be said that microdata intends to do to Microformat syntax > what HTML5 did to HTML4 syntax rules in the sense that parsing is > unambiguous and easier to validate normativity? I'd say that's true as far as what they both do, but not how they do it. HTML5 makes parsing unambiguous by describing a wide variety of parsing rules for invalid content. I'd say such tolerance of human error is more in line with the microformats approach. Microdata, on the other hand, simply changes the syntax to reduce the risk of invalid content. So in terms of strategy for making parsing unambiguous, microdata looks more like XHTML to me. On Jul 20, 2010, at 4:25 AM, Philip J?genstedt wrote: > Microdata should be compared to the class attributes and the various patterns that microformats use, not any specific vocabulary. Agreed! > The main benefit is that parsing becomes well-defined and simple. Right, a lot of it comes down to optimizing for parsers vs. optimizing for publishers. HTML itself is familiar to publishers, but difficult to parse for data. Microformats are limited to HTML to make things simpler for publishers at a cost to parsers. Microdata extends HTML to make things simpler for parsers at a cost to publishers. Of course, publishers and parsers need to work together, so these approaches only diverge so far. Peace, Scott From singpolyma at singpolyma.net Tue Jul 20 05:29:48 2010 From: singpolyma at singpolyma.net (Stephen Paul Weber) Date: Tue Jul 20 14:11:10 2010 Subject: [uf-discuss] re: HTML5 support In-Reply-To: References: <1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry> <3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com> <5DC1788B-59A0-45AE-8E88-4CF257DA642C@randomchaos.com> Message-ID: <1279628988.17280.2.camel@singpolyma-N900> > On Tue, 20 Jul 2010 06:05:06 +0200, Angelo Gladding > ? ? wrote: > > > Can an enlightened soul describe in which ways microdata is actually > > superior to profiled poshformats? > > Microdata should be compared to the class attributes and the various? > patterns that microformats use, not any specific vocabulary. The main? > benefit is that parsing becomes well-defined and simple. That's why it's >? ? possible to define a JavaScript API for accessing microdata items on a >? ? page, which makes the data useful to the page itself, not only > external? ? scrapers. It also makes it feasible to make browser features > like "add to? ? address book" or "add to calendar", which really isn't > really practical? ? with microformats when the data is hidden in class > attributes together? ? with everything else. Microformats data is not "hidden". Microformats are just well-done vocabulary specifications using the semantics of HTML. Is one of thlse semantics @class? Absolutely. It is by no means a primary or most important one. One of the benefits of using the real semantics of the page, and not some hacked-in layer like microdata, is that it works well with existing tools and markup. CSS styling of microformats, for example, "just works" and I use it all the time. DOM access similarly works well. Having written significant code both in-browser and out to parse microformats, I find the claim that parsing them using the DOM is "not practical" shocking. What would you prefer? Microformats psrsers are usually very easy to write precisely because they use the page's existing semantics, and thus are easily exposed to the tools used for all DOMscripting (including, but not limited to, selecting elements by class). Then again, I'm very biased. Microdata, like other superfluous parts of HTML5 (up there with audio and video tags) just makes me sad. Too much NIH From mail at tobyinkster.co.uk Wed Jul 21 02:09:22 2010 From: mail at tobyinkster.co.uk (Toby Inkster) Date: Wed Jul 21 02:09:59 2010 Subject: [uf-discuss] re: HTML5 support In-Reply-To: <1279628988.17280.2.camel@singpolyma-N900> References: <1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry> <3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com> <5DC1788B-59A0-45AE-8E88-4CF257DA642C@randomchaos.com> <1279628988.17280.2.camel@singpolyma-N900> Message-ID: <20100721100922.1521c725@miranda.g5n.co.uk> On Tue, 20 Jul 2010 08:29:48 -0400 Stephen Paul Weber wrote: > Having written significant code both in-browser and out to parse > microformats, I find the claim that parsing them using the DOM is > "not practical" shocking. What would you prefer? Parsing microformats via the DOM is not practical. Parsing them any other way is even worse though. While writing DOM code to parse a particular site's implementation of say, hCard, is pretty trivial, generalising that to support all the variations of how hCard is marked up in the wild is a lot of work. As a comparison, I have written Perl parsers[*] for microformats, RDFa and Microdata. Here are the lines-of-code counts for each, excluding documentation, comments and blank lines: Microdata : 945 RDFa 1.0 : 1265 RDFa 1.1 [**] : 2611 microformats : 9455 * = See . ** = this code actually handles both RDFa 1.0 and 1.1. Whatsmore it can handle them embedded in Atom, SVG and OpenDocument Format; not just (X)HTML. A pure RDFa-1.1-in-(X)HTML parser could probably be written in under 1000 lines of Perl. The amount of code needed to parse microformats is clearly different from the other formats. Another difference is that the Microdata and RDFa 1.0 implementations can be considered more-or-less complete. (The RDFa 1.1 working drafts are still somewhat is flux, so the implementation no doubt still needs changes.) If somebody comes up tomorrow with a new RDFa or Microdata vocabulary for describing cows, or bread makers, or train timetables, it will work out of the box. For microformats, that's not the case - code needs to be written. So you end up with a chicken-and-egg situation with nobody implementing tools for a new draft microformat because it's not used in the wild; nobody using it in the wild because of a lack of tool support; and the microformat never progressing beyond draft status because of lack of implementation experience, and uncertainty about how it might work in the wild. That's why we haven't had any of the draft microformats on the wiki move out of draft status in the last four years or so; or at least it's a major contributory factor. -- Toby A Inkster From philipj at opera.com Wed Jul 21 02:27:44 2010 From: philipj at opera.com (=?utf-8?Q?Philip_J=C3=A4genstedt?=) Date: Wed Jul 21 02:28:01 2010 Subject: [uf-discuss] re: HTML5 support In-Reply-To: References: <1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry> <3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com> <5DC1788B-59A0-45AE-8E88-4CF257DA642C@randomchaos.com> Message-ID: On Tue, 20 Jul 2010 21:55:38 +0200, Angelo Gladding wrote: > On Tue, Jul 20, 2010 at 3:25 AM, Philip J?genstedt > wrote: >> On Tue, 20 Jul 2010 06:05:06 +0200, Angelo Gladding >> >> wrote: >> >>> Can an enlightened soul describe in which ways microdata is actually >>> superior to profiled poshformats? >> >> Microdata should be compared to the class attributes and the various >> patterns that microformats use, not any specific vocabulary. > > Of course. Let me clarify. A `microformat` is a poshformat that has > undergone a relatively laborious process of research and brainstorming > to capture real world user requirements to make a minimal vocabulary > that can capture ~80% of current usage patterns. Microdata is a set of > rules governing a syntax. Hence my comparison of microdata to > poshformats, which are essentially microformats sans the due > diligence. Right, designing vocabularies is hard and requires due diligence. That's true no matter what the syntax is. >> The main benefit is that parsing becomes well-defined > > Ain't that the truth. > >> and simple. > > Or is it? I wonder how different the two sets of supporting algorithms > might look face to face once fully documented and implemented. > > The Microformats wiki makes the following comparison to microdata: > > 1. `itemprop` - is a more specific version of class, for field names. > 2. `subject` - allows semantically linking within the page. > Conceptually similar to the include-pattern. > 3. `itemref` - allows including properties elsewhere on the page that > are not descendants of itemscope. Takes space-separated ids (for > example itemref="address phone" would include the elements with > id="address" and id="phone"). Conceptually similar to the > include-pattern. > 4. `content` - on the meta element can be used to include invisible > data that is not part of the content. As current browsers move meta > inside , make sure to include via `itemref`. Conceptually > similar to the 'value-title' feature of the value-class-pattern. > 5. `itemscope` - identifies blocks to be marked as structured data. > Conceptually similar to the mfo brainstorming. > 6. `itemtype` - to specify the type for an item (for example: > itemtype="http://microformats.org/profile/hcard"). What wiki page is this from? subject has been replaced by itemid. I can't understand what the similary with the include-pattern could possibly be, though. > Distilled down: > > 1. @class > 2/3. include-pattern/table-header-pattern > 4. value-class-pattern > 5. "mfo" > 6. rel-profile > > Sounds to me like the same sort of desire for absolute normativity > that [non-HTML5] XHTML once attempted to burden the entirety of > humanity with. Ironically, HTML5 has deprecated such a style in favor > of a seemingly more flexible Microformat-esque syntax. Putting XHTML2 aside, one of the main achievements of HTML5 is having formalized how to parse all the sloppy, broken HTML out there (a.k.a. "tag soup"). While the syntax is flexible to authors, there's no flexibility whatsoever for an implementor how to parse it. The result will always be the same. In my view, microdata is to microformats what the HTML5 parser is to HTML4. It makes it possible to parse, without ever guessing, all the microdata items on a page. While it's really easy to write a microformat parser in JavaScript, you're not going to see that built into a browser, where each vocabulary needs a new parser. Microdata also hasn't been implemented by any browser yet, but I'm pretty sure it's going to happen if it takes off. > > Considering your affiliation with Opera, what might I ask are your > feelings about Operator? I've heard of it before, it looks like a custom Opera distribution? It has nothing to do with microformats or microdata as far as I can tell. >> which really isn't really practical with microformats when the >> data is hidden in class attributes together with everything else. > > As I alluded to above I see this as a complete non-issue yet you are > most certainly not the first to bring it up. What am I missing? If a browser is going to support some kind of embedded data vocabularies (like events or contacts), the code for parsing it isn't going to be written in JavaScript using the DOM, it's going to be in C++ or C operating on the internal datastructures of the browser. To support a specific microformat vocabulary, one would have to look through all the classes on all elements to find the "root" element, then speculatively search its children for the other structures of the microformat. Given that the all of the constructs used in microformats are also used for completely different things, so most of the data you inspect isn't actually going to be what you're looking for. Since one has to do this for all documents parsed (and not "on demand" like when finding a particular class using document.getElementsByClassName) my guess is that it's going to be slow. What's worse, you'll have to write more or this complicated, slow code for each vocabulary you want to support. If the data is put in new attributes like itemprop, the code for parsing it will be simpler and you won't have to write it again for every vocabulary support, you can just reuse your getItems(x) implementation to find all items of type x and go from there. Now, this is all theoretical since no browser has implemented this yet (I tried a bit on my free time, but had too little). If you don't care about browsers, then of course it doesn't matter. If microformats work for you then keep using them. I'm just saying that there's a better way forward. >>> Might a "humans first, machines second" CJKV internationalization of >>> `n` optimization be to analyze the contents of the `fn`'s @lang and >>> inner text and use either or both to better determine name order? >> >> The main problem with this is that due to lazy copy-pasting, lang="en" >> is >> often used even when the language isn't English. Also, in the case of >> e.g. >> Facebook, lang="en" would be correct for the page itself, but people's >> names >> aren't in English anyway. > > Check out http://ja-jp.facebook.com/people/gong-ye-zhong/100000456401743 > > ...
...???...
> > ?? can log in today and, without any cooperation from Facebook, append > a U+200B (zero-width space [1]) to his first name (regardless of the > input taking the form of one or two boxes), and immediately reap the > benefits of such an `n` optimization without negatively affecting UI, > sort order, etc. > > [1] http://en.wikipedia.org/wiki/Zero-width_space I don't speak Japanese, but I think ?? is the family name and ? is the given name. By not doing anything the 'n' optimization will incorrectly guess that the family name is ??? and given name unknown. By inserting a zero-width space, it will instead incorrectly guess that ?? is the given name and ? is the family name. Either way it's incorrect. >> The only way to get it right is to ask the user both for the full name, >> given name and family name, something I haven't ever seen. > > If you haven't seen it, then it isn't even a single way to get it > right -- another > byproduct of Microformats philosophy I believe. However, if optimizations > can yield 80%+ positive results when viewed in aggregate I personally > give > a little bit of magic a big thumbs up. I guess we're not going by the population of the earth then, since China, Japan, Vietnam and South Korea account for 23.36% of it. (http://en.wikipedia.org/wiki/List_of_countries_by_population) >> The most practical solution is to not guess at all, and I don't know >> of any negative effects of this. > > I just see a tiny hint of dehumanization. ;) Seriously though, what are the negative effects? I'm betting that the number of people that make good use of having the given name and family name separately in their address book aren't many enough to justify screwing it up for the population of East Asia. -- Philip J?genstedt Core Developer Opera Software From philipj at opera.com Wed Jul 21 02:43:53 2010 From: philipj at opera.com (=?utf-8?Q?Philip_J=C3=A4genstedt?=) Date: Wed Jul 21 03:18:39 2010 Subject: [uf-discuss] re: HTML5 support In-Reply-To: <35CD8166-D92D-4030-895A-141131120C4E@randomchaos.com> References: <1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry> <3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com> <5DC1788B-59A0-45AE-8E88-4CF257DA642C@randomchaos.com> <35CD8166-D92D-4030-895A-141131120C4E@randomchaos.com> Message-ID: On Tue, 20 Jul 2010 15:47:19 +0200, Scott Reynen wrote: > On Jul 19, 2010, at 8:57 PM, Oli Studholme wrote: > >>>> Microdata doesn't go out of its way to be compatible with existing >>>> RDF vocabularies >>> >>> Maybe not specific vocabularies (that's kind of my point), but RDF >>> itself is clearly a major consideration. There's a whole section on >>> it: >>> >>> http://www.w3.org/TR/microdata/#rdf >> >> No. There?s a sub-sub-section on converting to RDF, just as there are >> for converting to JSON and Atom. That?s not a design goal, it?s >> specified interoperability. > > They're mentioned as "requirements" in the original announcement: > > http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-May/019681.html > > But again, the RDF syntax doesn't matter. This is the important part > for me: > > "Distributed vocabulary development should be possible; it should not > require coordination through a centralised system." > > Distributed vocabulary development requires a general purpose solution. > Microformats don't have that requirement, so vocabulary-specific > solutions are common. Yes, which is why general purpose parsers cannot exist, and why browser support is unlikely. >>> I'd argue it is a bad idea in microdata, but not in microformats, >>> because of the very distinction I'm trying to draw between the two. >> >> As far as microdata goes it?s irrelevant ? that?s something decided by >> the *vocabulary* author. > > I don't think that's really true, though, and I think this is exactly > why n optimization was removed. For every other microdata property, the > value is determined by following the parsing rules in the microdata spec: > > http://www.w3.org/TR/microdata/#values > > With n optimization, undeclared properties are given values via a > completely different parsing model. This "magic" may not be explicitly > disallowed, but it doesn't really fit with the general design of > microdata. The magic was in the vCard extraction algorithm: The DOM isn't changed, that would indeed be a very bad fit with the overall design. > On Jul 19, 2010, at 10:05 PM, Angelo Gladding wrote: > >> Could it be said that microdata intends to do to Microformat syntax >> what HTML5 did to HTML4 syntax rules in the sense that parsing is >> unambiguous and easier to validate normativity? > > I'd say that's true as far as what they both do, but not how they do > it. HTML5 makes parsing unambiguous by describing a wide variety of > parsing rules for invalid content. I'd say such tolerance of human > error is more in line with the microformats approach. > > Microdata, on the other hand, simply changes the syntax to reduce the > risk of invalid content. So in terms of strategy for making parsing > unambiguous, microdata looks more like XHTML to me. HTML5 parsing is also unambiguous. The only reason it's so ridiculously complex is because it's needed to parse real markup on the web. With microdata there was no existing content, so it's possible to make a more sane definition. But of course, some parts may be too strict and I've previously left feedback and had gotten the spec changed due to this. If there are more things which are unnecessarily strict and makes it difficult for authors, please do send mail to the WHATWG or W3C so that it can be fixed. -- Philip J?genstedt Core Developer Opera Software From singpolyma at singpolyma.net Wed Jul 21 06:46:08 2010 From: singpolyma at singpolyma.net (Stephen Paul Weber) Date: Wed Jul 21 06:46:35 2010 Subject: [uf-discuss] re: HTML5 support In-Reply-To: References: <3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com> <5DC1788B-59A0-45AE-8E88-4CF257DA642C@randomchaos.com> <35CD8166-D92D-4030-895A-141131120C4E@randomchaos.com> Message-ID: <20100721134608.GA1496@singpolyma-svelti> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Somebody claiming to be Philip J?genstedt wrote: > On Tue, 20 Jul 2010 15:47:19 +0200, Scott Reynen > wrote: > > >Distributed vocabulary development requires a general purpose > >solution. Microformats don't have that requirement, so > >vocabulary-specific solutions are common. > > Yes, which is why general purpose parsers cannot exist, and why > browser support is unlikely. I'm pretty sure Firefox already supports ?fs... - -- Stephen Paul Weber, @singpolyma See for how I prefer to be contacted edition right joseph -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iQIcBAEBCAAGBQJMRvogAAoJENEcKRHOUZzekwMQALfvKvcVsCiFQbUEwIBLqMDe qutM1KNYLrF036gumqyoBliK59qzBzuxWGLhbgEBqF5lLaqWPKolU5Dd3EzpW6HV uYGpPrdw5L65L7NNUBNlrEfMkA1sa/EnF57at+/kcWhJSN5DG1uMJv5C9/pqdr4n Zcw53uUb+NP9FY75zEL1jgjeQFR5s1pIkBkx1gjipcPmvDQ7TZ8VQ+li0Rpja4ON T0jLLJ3qQVvmNmV1xrB6wI9fzopZ5LJycvfZaRONO7hPes1MIEuZWUiKFKho+h/4 Z1pY/twwCHI7VnnY7gbBh3U08ni1iYaaTbkphV153uxjRWSoBz0a8RxJ7U+StO6h dFX0WKt7GY+9kVbQiymvxB6fwUaiEJO5sUZQ4xpesXhwqcfRnwbFipzm4veVIqAb TfYdakiMkovKl5fAD1q671hJ82zfdI2PW2V8vPEWPc45yjasZMG59jHecCoFirFP Ir29bk2mEJOuce+zvboRod5yINuEXTzShv86dZyi9oFFLO3TQxQezXev+SGnd7lI LH6xbkYnfdSmTKjHK2v+edciIKt1z+B9ahe7YQxBWOlzcTpUXb6xTIspbIboc/0v CeRdKaTlPkzsfHqbs66/LSHIekippH4m4/7sB0ZICjCDjkQgElrhewGmOjYuxXes E3i3A4nfX9G5DxYl6asX =JAD3 -----END PGP SIGNATURE----- From singpolyma at singpolyma.net Wed Jul 21 07:07:06 2010 From: singpolyma at singpolyma.net (Stephen Paul Weber) Date: Wed Jul 21 07:07:15 2010 Subject: [uf-discuss] re: HTML5 support In-Reply-To: <20100721100922.1521c725@miranda.g5n.co.uk> References: <3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com> <5DC1788B-59A0-45AE-8E88-4CF257DA642C@randomchaos.com> <1279628988.17280.2.camel@singpolyma-N900> <20100721100922.1521c725@miranda.g5n.co.uk> Message-ID: <20100721140706.GB1496@singpolyma-svelti> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Somebody claiming to be Toby Inkster wrote: > On Tue, 20 Jul 2010 08:29:48 -0400 > Stephen Paul Weber wrote: > > > Having written significant code both in-browser and out to parse > > microformats, I find the claim that parsing them using the DOM is > > "not practical" shocking. What would you prefer? > > Parsing microformats via the DOM is not practical. Parsing them any > other way is even worse though. > > While writing DOM code to parse a particular site's implementation of > say, hCard, is pretty trivial, generalising that to support all the > variations of how hCard is marked up in the wild is a lot of work. > > As a comparison, I have written Perl parsers[*] for microformats, RDFa > and Microdata. Here are the lines-of-code counts for each, excluding > documentation, comments and blank lines: > > The amount of code needed to parse microformats is clearly different > from the other formats. Sure, but you're comparing apples and oranges. RDF and microdata are more like JSON and XML: popular but useless by themselves. They're just generic containers. So, yes, you can trivially parse out the KVPs they encode, but you have no idea what those are, what they mean, what the relationships between them are, nothing. So you would have to write more code to implement each specific vocabulary you were interested in, and do useful stuff with it. The microformats parsers, because they're parsing an actual vocabulary instead of a container format, yes there will be some more code, because both steps are happening at once. The data you get out is actually the data you want, that makes sense, though. When I want profile data, I write an hCard parser and grab it. The same deal with microdata would normally be done with a seperate "generic" parser and then the code to throw out all vocabularies I don't want, and then the one to massage into an internal data format that I want the vocabularies that I do. - -- Stephen Paul Weber, @singpolyma See for how I prefer to be contacted edition right joseph -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iQIcBAEBCAAGBQJMRv8JAAoJENEcKRHOUZze7lYP/A9AD+Vnwy2mEM+zOB7QITFc FlrVzGksiOnIyPtKIXgMG8Sm8doPRrG8JC0RtCA7V3BhVmNR8dry+5A8PCCpLOyl 8CUym6G10RYduQQ0rdQCYMB6E37BgAq3Vl9oi9xUSZwsbJepEdIrSeifUZnbYtA0 ZMD/ADmLBYyqeHUf1/0So/m7W4vxtki7eUX0i95YgW997AFntKYZBfY2gtOTvvur Cx53jMWGkZdNgvGg/Mc9eyR011bPec7RtDkbYJJoUaVCiezxk1wFhzR6lLgcoRyB ZM4zEIBAOGS3UrT+pchX6OYGpL/3JGdCFdUkFPLbQlH1lOO1X1brogS3rJRDIyGk X1DQu0Md0b03vzw/wW5tIs93TCN2uGjiwXjC4ytFY7wuk9K9vwtZQQL6O8a9dJTf 9QFdGopQvn5YIFbVK/3p+9lPJUmu4+BljEDSVtQYzT0RA3b/qXvgJmqOzYBau9Eo 2YczFkjF69y3llaX5zAoOmQHhD1uKYjZUbOj+8fHZSKccPSwZXuXnR+sSrWlm3nR Hr81QftUoO3IztBqargQVXbDiW+f+BItb1xPm343sxiFSVfXDFtcUp2kaEvF39no LAG/XPnLDhV9FtDTwXwbhbfBQ4dCxRxQIkwfD8Jf5uFVLyWfpyB3+90yEdPVjhnO wb76GF2GtcZiGY/5J/AN =ORD1 -----END PGP SIGNATURE----- From philipj at opera.com Wed Jul 21 07:33:08 2010 From: philipj at opera.com (=?iso-8859-15?Q?Philip_J=E4genstedt?=) Date: Wed Jul 21 07:33:26 2010 Subject: [uf-discuss] re: HTML5 support In-Reply-To: <20100721134608.GA1496@singpolyma-svelti> References: <3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com> <5DC1788B-59A0-45AE-8E88-4CF257DA642C@randomchaos.com> <35CD8166-D92D-4030-895A-141131120C4E@randomchaos.com> <20100721134608.GA1496@singpolyma-svelti> Message-ID: On Wed, 21 Jul 2010 15:46:08 +0200, Stephen Paul Weber wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA256 > > Somebody claiming to be Philip J?genstedt wrote: >> On Tue, 20 Jul 2010 15:47:19 +0200, Scott Reynen >> wrote: >> >> >Distributed vocabulary development requires a general purpose >> >solution. Microformats don't have that requirement, so >> >vocabulary-specific solutions are common. >> >> Yes, which is why general purpose parsers cannot exist, and why >> browser support is unlikely. > > I'm pretty sure Firefox already supports ?fs... Are you sure it's not a plugin? If not, I'd be very interested to see it in action. -- Philip J?genstedt Core Developer Opera Software From info at csarven.ca Wed Jul 21 09:04:42 2010 From: info at csarven.ca (Sarven Capadisli) Date: Wed Jul 21 09:04:51 2010 Subject: [uf-discuss] re: HTML5 support In-Reply-To: References: <3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com> <5DC1788B-59A0-45AE-8E88-4CF257DA642C@randomchaos.com> <35CD8166-D92D-4030-895A-141131120C4E@randomchaos.com> <20100721134608.GA1496@singpolyma-svelti> Message-ID: <1279728282.1873.167.camel@csarven-laptop> On Wed, 2010-07-21 at 16:33 +0200, Philip J?genstedt wrote: > On Wed, 21 Jul 2010 15:46:08 +0200, Stephen Paul Weber > wrote: > > > -----BEGIN PGP SIGNED MESSAGE----- > > Hash: SHA256 > > > > Somebody claiming to be Philip J?genstedt wrote: > >> On Tue, 20 Jul 2010 15:47:19 +0200, Scott Reynen > >> wrote: > >> > >> >Distributed vocabulary development requires a general purpose > >> >solution. Microformats don't have that requirement, so > >> >vocabulary-specific solutions are common. > >> > >> Yes, which is why general purpose parsers cannot exist, and why > >> browser support is unlikely. > > > > I'm pretty sure Firefox already supports ?fs... > > Are you sure it's not a plugin? If not, I'd be very interested to see it > in action. > It has some support. See also resource://gre/modules/Microformats.js and https://developer.mozilla.org/en/Using_microformats Probably the best way to see it in action is via JetPack: https://jetpack.mozillalabs.com/ -Sarven From microformats.org at boblet.net Thu Jul 22 07:20:20 2010 From: microformats.org at boblet.net (Oli Studholme) Date: Thu Jul 22 07:20:52 2010 Subject: [uf-discuss] re: HTML5 support In-Reply-To: References: <1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry> <3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com> <5DC1788B-59A0-45AE-8E88-4CF257DA642C@randomchaos.com> Message-ID: Hey All, Wow, this has turned into a really interesting thread. Thank you all for your input. I just want to address a couple of points? ;) On Wed, Jul 21, 2010 at 6:27 PM, Philip J?genstedt wrote: >>> >>> The main problem with this is that due to lazy copy-pasting, lang="en" is >>> often used even when the language isn't English. Also, in the case of >>> e.g. >>> Facebook, lang="en" would be correct for the page itself, but people's >>> names >>> aren't in English anyway. >> >> Check out http://ja-jp.facebook.com/people/gong-ye-zhong/100000456401743 >> ...
...???...
>> >> ?? can log in today and, without any cooperation from Facebook, append >> a U+200B (zero-width space [1]) to his first name (regardless of the >> input taking the form of one or two boxes), and immediately reap the >> benefits of such an `n` optimization without negatively affecting UI, >> sort order, etc. > > I don't speak Japanese, but I think ?? is the family name and ? is the given > name. By not doing anything the 'n' optimization will incorrectly guess that > the family name is ??? and given name unknown. By inserting a zero-width > space, it will instead incorrectly guess that ?? is the given name and ? is > the family name. Either way it's incorrect. ??? is the Japanese name Miyano (??) Shu (?) (well, probably ? there may be other readings for ?). As Philip correctly guesses, Miyano is the family name, so inserting any form of space character would give an incorrectly reversed name using implied ?n? optimisation. While Tantek?s suggested workaround of using the declared language would work on the Japanese Facebook site, the @lang changes based on location. For example: http://www.facebook.com/people/gong-ye-zhong/100000456401743 has the same content with In addition to the points Philip made about @lang often being wrong, a lot of the time it isn?t even present (well in Japan anyhow). I did a quick search on a popular Japanese surname (28 mil results in Google), and only 6 of the first 10 results declared @lang: http://microformats.org/wiki/hcard-issues-resolved#resolved_2010 As you can guess, it goes downhill from there. (btw, thanks for your comments Tantek ? let me know if you want me to open the separate issue) Philip, the implied ?n? optimisation doesn?t work on single word names; they would get implied ?nickname? optimisation instead. On Tue, Jul 20, 2010 at 9:29 PM, Stephen Paul Weber wrote: > Microformats data is not "hidden" In general this is true for microdata too. > One of the benefits of using the real semantics of the page, and not some ?hacked-in layer like microdata, is that it works well with existing tools and markup. ?CSS styling of microformats, for example, "just works" and I use it all the time. ?DOM access similarly works well. ?hacked-in?? It?s specced on w3.org and includes an API. Also, check out the CSS 2.1 [attr] selector. On Wed, Jul 21, 2010 at 4:55 AM, Angelo Gladding wrote: > However, if optimizations > ?can yield 80%+ positive results when viewed in aggregate I personally give > ?a little bit of magic a big thumbs up. I?m guessing this wasn?t the metric by which using datetimes in the abbr design pattern was depreciated On Tue, Jul 20, 2010 at 2:41 PM, Martin McEvoy wrote: > Im sorry but you cannot express *microformats* in microdata if you do, its > cute, but It isn't a microformat because microformats *only* use ?class > names, and a few choice rel-values. ?If you move a microformat away from > @class its no longer a microformat and shouldn't be described as such I?m sorry, but I don?t think this is correct. You?re mixing the technology with the goal (and forgetting VoteLinks and @profile ;-) ?Designed for humans first and machines second, microformats are a set of simple, open data formats built upon existing and widely adopted standards? ? Microformats wiki about page ?Microformats are more than simply a technology like CSS or XHTML?they are an approach to solving the important problem of creating a rich semantic markup? ? Microformats, John Allsopp, p6 peace - oli From philipj at opera.com Thu Jul 22 06:53:10 2010 From: philipj at opera.com (=?iso-8859-15?Q?Philip_J=E4genstedt?=) Date: Thu Jul 22 08:08:35 2010 Subject: [uf-discuss] re: HTML5 support In-Reply-To: <1279728282.1873.167.camel@csarven-laptop> References: <3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com> <5DC1788B-59A0-45AE-8E88-4CF257DA642C@randomchaos.com> <35CD8166-D92D-4030-895A-141131120C4E@randomchaos.com> <20100721134608.GA1496@singpolyma-svelti> <1279728282.1873.167.camel@csarven-laptop> Message-ID: On Wed, 21 Jul 2010 18:04:42 +0200, Sarven Capadisli wrote: > On Wed, 2010-07-21 at 16:33 +0200, Philip J?genstedt wrote: >> On Wed, 21 Jul 2010 15:46:08 +0200, Stephen Paul Weber >> wrote: >> >> > -----BEGIN PGP SIGNED MESSAGE----- >> > Hash: SHA256 >> > >> > Somebody claiming to be Philip J?genstedt wrote: >> >> On Tue, 20 Jul 2010 15:47:19 +0200, Scott Reynen >> >> wrote: >> >> >> >> >Distributed vocabulary development requires a general purpose >> >> >solution. Microformats don't have that requirement, so >> >> >vocabulary-specific solutions are common. >> >> >> >> Yes, which is why general purpose parsers cannot exist, and why >> >> browser support is unlikely. >> > >> > I'm pretty sure Firefox already supports ?fs... >> >> Are you sure it's not a plugin? If not, I'd be very interested to see it >> in action. >> > > It has some support. See also resource://gre/modules/Microformats.js and > https://developer.mozilla.org/en/Using_microformats > > Probably the best way to see it in action is via JetPack: > https://jetpack.mozillalabs.com/ Thanks, that's pretty cool. However, I note that this is only loaded on demand. Looking for e.g. hcards on every page parsed is not quite the same thing, and is what you'd need to do to have a button similar to the orange "feed" button pop up for all pages where there's something to add to the address book or calendar. -- Philip J?genstedt Core Developer Opera Software From angelo at gladding.name Thu Jul 22 11:32:33 2010 From: angelo at gladding.name (Angelo Gladding) Date: Thu Jul 22 11:32:53 2010 Subject: [uf-discuss] re: HTML5 support In-Reply-To: References: <3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com> <5DC1788B-59A0-45AE-8E88-4CF257DA642C@randomchaos.com> <35CD8166-D92D-4030-895A-141131120C4E@randomchaos.com> <20100721134608.GA1496@singpolyma-svelti> <1279728282.1873.167.camel@csarven-laptop> Message-ID: On Thu, Jul 22, 2010 at 6:53 AM, Philip J?genstedt wrote: > On Wed, 21 Jul 2010 18:04:42 +0200, Sarven Capadisli > wrote: > >> On Wed, 2010-07-21 at 16:33 +0200, Philip J?genstedt wrote: >>> >>> On Wed, 21 Jul 2010 15:46:08 +0200, Stephen Paul Weber >>> wrote: >>> >>> > -----BEGIN PGP SIGNED MESSAGE----- >>> > Hash: SHA256 >>> > >>> > Somebody claiming to be Philip J?genstedt wrote: >>> >> On Tue, 20 Jul 2010 15:47:19 +0200, Scott Reynen >>> >> wrote: >>> >> >>> >> >Distributed vocabulary development requires a general purpose >>> >> >solution. ?Microformats don't have that requirement, so >>> >> >vocabulary-specific solutions are common. >>> >> >>> >> Yes, which is why general purpose parsers cannot exist, and why >>> >> browser support is unlikely. >>> > >>> > I'm pretty sure Firefox already supports ?fs... >>> >>> Are you sure it's not a plugin? If not, I'd be very interested to see it >>> in action. >>> >> >> It has some support. See also resource://gre/modules/Microformats.js and >> https://developer.mozilla.org/en/Using_microformats >> >> Probably the best way to see it in action is via JetPack: >> https://jetpack.mozillalabs.com/ > > Thanks, that's pretty cool. However, I note that this is only loaded on > demand. Looking for e.g. hcards on every page parsed is not quite the same > thing, and is what you'd need to do to have a button similar to the orange > "feed" button pop up for all pages where there's something to add to the > address book or calendar. > Firefox's Operator Plugin [1] has sniffed the microformats of each and every document that I have opened on multiple computers (ranging from slow to fast) for several years now. Make sure to install appropriate user scripts [2]. [1]: https://addons.mozilla.org/en-US/firefox/addon/4106/ [2]: http://kaply.com/weblog/operator-user-scripts/ -- Angelo Gladding angelo@gladding.name From angelo at gladding.name Thu Jul 22 12:15:48 2010 From: angelo at gladding.name (Angelo Gladding) Date: Thu Jul 22 12:15:56 2010 Subject: [uf-discuss] re: HTML5 support In-Reply-To: <20100721140706.GB1496@singpolyma-svelti> References: <3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com> <5DC1788B-59A0-45AE-8E88-4CF257DA642C@randomchaos.com> <1279628988.17280.2.camel@singpolyma-N900> <20100721100922.1521c725@miranda.g5n.co.uk> <20100721140706.GB1496@singpolyma-svelti> Message-ID: On Wed, Jul 21, 2010 at 7:07 AM, Stephen Paul Weber wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA256 > > Somebody claiming to be Toby Inkster wrote: >> On Tue, 20 Jul 2010 08:29:48 -0400 >> Stephen Paul Weber wrote: >> >> > Having written significant code both in-browser and out to parse >> > microformats, I find the claim that parsing them using the DOM is >> > "not practical" shocking. ?What would you prefer? >> >> Parsing microformats via the DOM is not practical. Parsing them any >> other way is even worse though. >> >> While writing DOM code to parse a particular site's implementation of >> say, hCard, is pretty trivial, generalising that to support all the >> variations of how hCard is marked up in the wild is a lot of work. >> >> As a comparison, I have written Perl parsers[*] for microformats, RDFa >> and Microdata. Here are the lines-of-code counts for each, excluding >> documentation, comments and blank lines: >> >> The amount of code needed to parse microformats is clearly different >> from the other formats. > > Sure, but you're comparing apples and oranges. ?RDF and microdata are more > like JSON and XML: popular but useless by themselves. ?They're just generic > containers. ?So, yes, you can trivially parse out the KVPs they encode, but > you have no idea what those are, what they mean, what the relationships > between them are, nothing. ?So you would have to write more code to > implement each specific vocabulary you were interested in, and do useful > stuff with it. ?The microformats parsers, because they're parsing an actual > vocabulary instead of a container format, yes there will be some more code, > because both steps are happening at once. > > The data you get out is actually the data you want, that makes sense, though. > When I want profile data, I write an hCard parser and grab it. ?The same > deal with microdata would normally be done with a seperate "generic" parser > and then the code to throw out all vocabularies I don't want, and then the > one to massage into an internal data format that I want the vocabularies > that I do. On Wed, Jul 21, 2010 at 2:09 AM, Toby Inkster wrote: > Microdata : 945 > RDFa 1.0 : 1265 > RDFa 1.1 [**] : 2611 > microformats : 9455 It's tough to argue with an order of magnitude difference with the most complete, public universal implementation to date. So what is the fundamental difference between the two approaches? It appears that Microdata takes us through lexical analysis and leaves us with a parse tree (?) while Microformats take us through the secondary stage of syntactic/semantic analysis and leaves us with a semantic graph (?). Does Microdoata check syntax as well? If so, how does it know what syntax to look for without sniffing the vocabulary specification? e.g. How does the parser know to store http://microformats.org/wiki/hcard#bday as a datetime? - - - On a related note, how many of our issues does MF2 [1] stand to resolve? Reading these notes has green-lighted a couple of features I was tentatively considering for my universal parser. Future proofing my implementation (and participating in this conversation!) has helped me to better understand the two approaches' design goals. MF2 looks to be the logical middle-ground and may very well render much of this conversation moot. [1]: http://microformats.org/wiki/events/2010-05-02-microformats-2-0 -- Angelo Gladding angelo@gladding.name From angelo at gladding.name Thu Jul 22 12:51:45 2010 From: angelo at gladding.name (Angelo Gladding) Date: Thu Jul 22 12:57:20 2010 Subject: [uf-discuss] re: HTML5 support In-Reply-To: References: <1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry> <3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com> <5DC1788B-59A0-45AE-8E88-4CF257DA642C@randomchaos.com> Message-ID: On Thu, Jul 22, 2010 at 7:20 AM, Oli Studholme wrote: > ??? is the Japanese name Miyano (??) Shu (?) (well, probably ? there > may be other readings for ?). As Philip correctly guesses, Miyano is > the family name, so inserting any form of space character would give > an incorrectly reversed name using implied ?n? optimisation. My original intentions were to fall back on @lang in case sniffing Unicode ranges couldn't handle all of the cases. However, if that were the case, would it too be sufficiently magic? As I mentioned to Philip above, I'll draft the algorithm and post it back to be more clear. -- Angelo Gladding angelo@gladding.name From microformats.org at boblet.net Thu Jul 22 18:21:06 2010 From: microformats.org at boblet.net (Oli Studholme) Date: Thu Jul 22 18:21:32 2010 Subject: [uf-discuss] re: HTML5 support In-Reply-To: References: <1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry> <3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com> <5DC1788B-59A0-45AE-8E88-4CF257DA642C@randomchaos.com> Message-ID: Hey Angelo, On Fri, Jul 23, 2010 at 4:51 AM, Angelo Gladding wrote: > On Thu, Jul 22, 2010 at 7:20 AM, Oli Studholme > wrote: >> ??? is the Japanese name Miyano (??) Shu (?) (well, probably ? there >> may be other readings for ?). As Philip correctly guesses, Miyano is >> the family name, so inserting any form of space character would give >> an incorrectly reversed name using implied ?n? optimisation. > > My original intentions were to fall back on @lang in case sniffing > Unicode ranges couldn't > handle all of the cases. However, if that were the case, would it too > be sufficiently magic? > > As I mentioned to Philip above, I'll draft the algorithm and post it > back to be more clear. I think the magic part is less of a problem than the magic sometimes not working part. You?ll also need to convert to Unicode for pages in other encodings (three others used in Japan), while keeping in mind encodings are sometimes not declared. If you need any help for Japanese let me know peace - oli PS speaking of encodings I recently saw a Japanese page using two different encodings (second via iframe), neither of which were declared. Mojibake disaster! :O From scott at randomchaos.com Thu Jul 22 21:41:41 2010 From: scott at randomchaos.com (Scott Reynen) Date: Thu Jul 22 21:41:48 2010 Subject: [uf-discuss] n optimization internationalization (Was: HTML5 support) In-Reply-To: References: <1343677096-1278952323-cardhu_decombobulator_blackberry.rim.net-1341531973-@bda560.bisx.prod.on.blackberry> <3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com> <5DC1788B-59A0-45AE-8E88-4CF257DA642C@randomchaos.com> Message-ID: On Jul 22, 2010, at 1:51 PM, Angelo Gladding wrote: > On Thu, Jul 22, 2010 at 7:20 AM, Oli Studholme > wrote: >> ??? is the Japanese name Miyano (??) Shu (?) (well, probably ? there >> may be other readings for ?). As Philip correctly guesses, Miyano is >> the family name, so inserting any form of space character would give >> an incorrectly reversed name using implied ?n? optimisation. > > My original intentions were to fall back on @lang in case sniffing > Unicode ranges couldn't > handle all of the cases. However, if that were the case, would it too > be sufficiently magic? > > As I mentioned to Philip above, I'll draft the algorithm and post it > back to be more clear. I don't believe any algorithm can reliably predict how n optimization should be applied, so it should be used sparingly (only when name order is known) even with increased consideration of non-English names. I know plenty of Japanese people who, at least when they're interacting primarily with English speakers, write their name given name first (e.g. Shu Miyano), just as most English speakers do. Sometimes they even do this when writing their names in Japanese. A couple examples: http://en.wikipedia.org/wiki/Yoko_Ono http://en.wikipedia.org/wiki/Joi_Ito Note that both names are printed both ways, given name first and family name first. Although they can be useful for making better guessing, neither language nor unicode ranges can reliably tell us which name is given and which is family. Peace, Scott From philipj at opera.com Fri Jul 23 07:34:04 2010 From: philipj at opera.com (=?utf-8?Q?Philip_J=C3=A4genstedt?=) Date: Fri Jul 23 07:34:28 2010 Subject: [uf-discuss] re: HTML5 support In-Reply-To: References: <3FADC4D7-B3A5-45CD-82E4-EC5DEFF594DF@randomchaos.com> <5DC1788B-59A0-45AE-8E88-4CF257DA642C@randomchaos.com> <1279628988.17280.2.camel@singpolyma-N900> <20100721100922.1521c725@miranda.g5n.co.uk> <20100721140706.GB1496@singpolyma-svelti> Message-ID: On Thu, 22 Jul 2010 21:15:48 +0200, Angelo Gladding wrote: > Does Microdoata check syntax as well? If so, how does it know what syntax > to look for without sniffing the vocabulary specification? e.g. How does > the > parser know to store http://microformats.org/wiki/hcard#bday as a > datetime? No, there's no checking of the vocabulary-specific rules. When it comes to dates, those are expressed using