[uf-discuss] 2 billion hCards! gathering material for a "microformats.org turns 5" blog post

Wed Jul 7 08:24:52 PDT 2010

On Wed, Jul 7, 2010 at 4:43 AM, Toby Inkster <mail at tobyinkster.co.uk> wrote:
> On Wed, 7 Jul 2010 02:25:38 -0700
> Tantek Çelik <tantek at cs.stanford.edu> wrote:
>
>> E.g. Wordpress.org results don't have any RDFa.
>>
>> View source and the only thing even remotely resembling you see is:
>>
>> <meta property="fb:page_id" content="...">
>>
>> - which is simply use of an invalid "property" attribute (in XHTML
>> 1.0). The qname "fb:" is not defined anywhere.
>
> In the current RDFa 1.1 drafts, this is allowed, though its meaning is
> not likely what the authors of this page intended. In 1.1, prefixes
> which are not bound to anything are assumed to be absolute URIs.

So it's another form of invalid syntax then, since "fb:" is not a
defined protocol.

> The page at http://wordpress.org/ does actually contain 3 triples if
> evaluated as RDFa 1.0, though they're each the result of RDFa
> grandfathering in certain HTML 4/XHTML 1 semantics.

No, it might contain 3 RDF triples - but they're not RDF*a*.

Just because a page can be parsed/converted into another format does
not mean it "contains" that format.

Saying so is deceptively mis-using the word "contains" at best, and
playing semantic games at worst.

Just because a page has hAtom does not mean it "contains" Atom.

Just because a page has microdata does not mean it "contains" JSON
(though an exceptionally precise direct conversion is defined). etc.

Similarly to microdata, as we define more precise parsing rules for
microformats, we'll have direct conversions to JSON and RDF triples as
well.  This does not mean that all pages with microformats "contain"
JSON or RDF.

The question of comparison is deliberately chosen to illuminate what
are developers actually coding? What syntax? Not what can you "infer",
"parse as", or "convert to".

Because as you know with the parsers you've written, you can convert
syntaxes to nearly any implied format - it tells you nothing about
usage.

> The question "how many pages contain RDFa?" is only meaningful if
> certain qualifications are added... Does broken RDFa count?

broken RDFa counts, but only to demonstrate the difficulty of coding
RDFa, not instances of RDF(a). one of the reasons that Google found so
little RDFa is may be because much of it was broken. this is one of
the common problems with namespaces in data.

> Do
> grandfathered rel/rev values count? &c.

rel/rev syntax and values work without RDFa - they're not RDFa,
despite RDFa's attempt to subsume them (and even errantly claim/imply
credit in the spec, e.g. rel-license).

> In fact, "how many pages" questions about the Web are not especially
> meaningful. Say Google added an hCard to its search result pages,
> replacing its current logo with something like this:
>
>        <span class="vcard">
>                <a href="/" class="url">
>                        <img class="logo fn org"
>                        alt="Google" src="..." />
>                </a>
>        </span>
>
> Are the search results for "foo" and "bar" different pages? What about
> the search results for "100000000001" and "100000000002"? Because if
> they are, that's over a hundred billion hCards online.

1. theoretical strawman[1]
2. google.com/robots.txt prevents this from counting in any "search"

Tantek

[1] http://en.wikipedia.org/wiki/Straw_man

-- 
http://tantek.com/