[uf-discuss] xdmp profiles not enough for parsing?

Wed Nov 16 07:57:56 PST 2005

On 11/16/05 7:44 AM, "brian suda" <brian.suda at gmail.com> wrote:

> Phil Dawes wrote:
> 
>> Hi Microformats list,
>> 
>> When coding up my python microformats parser, one of the problems I
>> encountered was getting the parser to interpret the structure
>> correctly (see [1]). In order to overcome this my parser currently
>> hardcodes the elements that can have sub-elements in an internal data
>> structure.
>> (e.g. for hCard: 'adr', 'geo' and 'n' can have subelements).
>> 
>> I've just noticed that xdmp profiles don't carry this information, and
>> was wondering if this scuppers the general idea of parsing
>> microformats from their profiles?
>> (or am I missing something)
> 
> XMDP doesn't do alot of things, it doesn't do any sort of Typing either.
> In hCard DTSTART is an ISO date. That is something that is NOT encoded
> in XMDP as something that is machine readable.
> 
> This leads back to a discussion a few weeks ago about a universal, or
> general Microformat parser. The general consensus was that it would be
> very difficult because of many of these restrictions.

Not quite.  That's missing a very important point.

Rather, the conclusion is that any attempt to historically do so for any
schema and language has been incomplete in some way.

The HTML4 DTDs don't reflect all that a parser must do to properly parse
HTML4 for example.  Even *only* valid HTML4.  Schemas for XHTML do not even
accurately reflect all the rules that the DTDs for XHTML reflect.

The conclusion was, in practice, complete automatic generic parsability is
futile, and thus not worth pursuing, in XMDP or any other schema like
language.

Tantek