[uf-discuss] 2 billion hCards! gathering material for a "microformats.org turns 5" blog post

Wed Jul 7 10:10:07 PDT 2010

On Jul 7, 2010, at 6:24 PM, Tantek Çelik wrote:

> On Wed, Jul 7, 2010 at 4:43 AM, Toby Inkster <mail at tobyinkster.co.uk> wrote:
>> On Wed, 7 Jul 2010 02:25:38 -0700
>> Tantek Çelik <tantek at cs.stanford.edu> wrote:
>> 
>>> E.g. Wordpress.org results don't have any RDFa.
>>> 
>>> View source and the only thing even remotely resembling you see is:
>>> 
>>> <meta property="fb:page_id" content="...">
>>> 
>>> - which is simply use of an invalid "property" attribute (in XHTML
>>> 1.0). The qname "fb:" is not defined anywhere.
>> 
>> In the current RDFa 1.1 drafts, this is allowed, though its meaning is
>> not likely what the authors of this page intended. In 1.1, prefixes
>> which are not bound to anything are assumed to be absolute URIs.
> 
> So it's another form of invalid syntax then, since "fb:" is not a
> defined protocol.
> 
> 
>> The page at http://wordpress.org/ does actually contain 3 triples if
>> evaluated as RDFa 1.0, though they're each the result of RDFa
>> grandfathering in certain HTML 4/XHTML 1 semantics.
> 
> No, it might contain 3 RDF triples - but they're not RDF*a*.
> 
> Just because a page can be parsed/converted into another format does
> not mean it "contains" that format.
> 
> Saying so is deceptively mis-using the word "contains" at best, and
> playing semantic games at worst.
> 
> Just because a page has hAtom does not mean it "contains" Atom.
> 
> Just because a page has microdata does not mean it "contains" JSON
> (though an exceptionally precise direct conversion is defined). etc.
> 
> Similarly to microdata, as we define more precise parsing rules for
> microformats, we'll have direct conversions to JSON and RDF triples as
> well.  This does not mean that all pages with microformats "contain"
> JSON or RDF.
> 
> The question of comparison is deliberately chosen to illuminate what
> are developers actually coding? What syntax? Not what can you "infer",
> "parse as", or "convert to".
> 
> Because as you know with the parsers you've written, you can convert
> syntaxes to nearly any implied format - it tells you nothing about
> usage.
> 
> 
>> The question "how many pages contain RDFa?" is only meaningful if
>> certain qualifications are added... Does broken RDFa count?
> 
> broken RDFa counts, but only to demonstrate the difficulty of coding
> RDFa, not instances of RDF(a). one of the reasons that Google found so
> little RDFa is may be because much of it was broken. this is one of
> the common problems with namespaces in data.

does broken tantek count?
this "my format is longer than your format" strikes me as rather silly.
50 million elvis fans can't be wrong (most of them use neither).

regards
thomas lörtsch

> 
>> Do
>> grandfathered rel/rev values count? &c.
> 
> rel/rev syntax and values work without RDFa - they're not RDFa,
> despite RDFa's attempt to subsume them (and even errantly claim/imply
> credit in the spec, e.g. rel-license).
> 
> 
>> In fact, "how many pages" questions about the Web are not especially
>> meaningful. Say Google added an hCard to its search result pages,
>> replacing its current logo with something like this:
>> 
>>       <span class="vcard">
>>               <a href="/" class="url">
>>                       <img class="logo fn org"
>>                       alt="Google" src="..." />
>>               </a>
>>       </span>
>> 
>> Are the search results for "foo" and "bar" different pages? What about
>> the search results for "100000000001" and "100000000002"? Because if
>> they are, that's over a hundred billion hCards online.
> 
> 1. theoretical strawman[1]
> 2. google.com/robots.txt prevents this from counting in any "search"
> 
> 
> Tantek
> 
> [1] http://en.wikipedia.org/wiki/Straw_man
> 
> -- 
> http://tantek.com/
> 
> _______________________________________________
> microformats-discuss mailing list
> microformats-discuss at microformats.org
> http://microformats.org/mailman/listinfo/microformats-discuss
>