[uf-discuss] re: HTML5 support

Thu Jul 22 12:15:48 PDT 2010

On Wed, Jul 21, 2010 at 7:07 AM, Stephen Paul Weber
<singpolyma at singpolyma.net> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> Somebody claiming to be Toby Inkster wrote:
>> On Tue, 20 Jul 2010 08:29:48 -0400
>> Stephen Paul Weber <singpolyma at singpolyma.net> wrote:
>>
>> > Having written significant code both in-browser and out to parse
>> > microformats, I find the claim that parsing them using the DOM is
>> > "not practical" shocking.  What would you prefer?
>>
>> Parsing microformats via the DOM is not practical. Parsing them any
>> other way is even worse though.
>>
>> While writing DOM code to parse a particular site's implementation of
>> say, hCard, is pretty trivial, generalising that to support all the
>> variations of how hCard is marked up in the wild is a lot of work.
>>
>> As a comparison, I have written Perl parsers[*] for microformats, RDFa
>> and Microdata. Here are the lines-of-code counts for each, excluding
>> documentation, comments and blank lines:
>>
>> The amount of code needed to parse microformats is clearly different
>> from the other formats.
>
> Sure, but you're comparing apples and oranges.  RDF and microdata are more
> like JSON and XML: popular but useless by themselves.  They're just generic
> containers.  So, yes, you can trivially parse out the KVPs they encode, but
> you have no idea what those are, what they mean, what the relationships
> between them are, nothing.  So you would have to write more code to
> implement each specific vocabulary you were interested in, and do useful
> stuff with it.  The microformats parsers, because they're parsing an actual
> vocabulary instead of a container format, yes there will be some more code,
> because both steps are happening at once.
>
> The data you get out is actually the data you want, that makes sense, though.
> When I want profile data, I write an hCard parser and grab it.  The same
> deal with microdata would normally be done with a seperate "generic" parser
> and then the code to throw out all vocabularies I don't want, and then the
> one to massage into an internal data format that I want the vocabularies
> that I do.

On Wed, Jul 21, 2010 at 2:09 AM, Toby Inkster <mail at tobyinkster.co.uk> wrote:
> Microdata      :  945
> RDFa 1.0       : 1265
> RDFa 1.1 [**]  : 2611
> microformats   : 9455

It's tough to argue with an order of magnitude difference with
the most complete, public universal implementation to date.

So what is the fundamental difference between the two approaches?

It appears that Microdata takes us through lexical analysis and leaves us
with a parse tree (?) while Microformats take us through the secondary stage
of syntactic/semantic analysis and leaves us with a semantic graph (?).

Does Microdoata check syntax as well? If so, how does it know what syntax
to look for without sniffing the vocabulary specification? e.g. How does the
parser know to store http://microformats.org/wiki/hcard#bday as a datetime?

- - -

On a related note, how many of our issues does MF2 [1] stand to resolve?
Reading these notes has green-lighted a couple of features I was tentatively
considering for my universal parser. Future proofing my implementation (and
participating in this conversation!) has helped me to better understand the
two approaches' design goals. MF2 looks to be the logical middle-ground
and may very well render much of this conversation moot.

[1]: http://microformats.org/wiki/events/2010-05-02-microformats-2-0

-- 
Angelo Gladding
angelo at gladding.name