[uf-discuss] xdmp profiles not enough for parsing?

Wed Nov 16 09:13:03 PST 2005

On 11/16/05 9:05 AM, "David Janes -- BlogMatrix" <davidjanes at blogmatrix.com>
wrote:

> Tantek Çelik wrote:
>> But how much easier?  Is it worth the effort?
>> 
>> Rather than wasting time incrementally making something more generic, why
>> not spend the time coding specific parsers?  I bet you'll get more done in
>> less time that way.
> 
> I'll just add the observation that (as far as I can tell) Phil is
> planning to go down this path anyway. My belief is that  there'll
> probably be some low hanging fruit here where general utility can be
> found and then after than there's probably a very solid wall [to mix
> metaphors].

Ok, I'll put it yet another way.

The specific addition that Phil was asking for was for which properties went
inside which other properties.

While it may seem this makes writing a generic parser easier for a specific
instance, it actually makes the XMDP less reusable.

Right now the XMDP only defines a list of properties, values, and their
meanings, period.

This means you can reuse the same XMDP and properties and values to build
additional formats.

As soon as you start being stricter about "A can only be embedded in B"
etc., you end up limiting the reusability of the XMDP.

Worse than that, you WILL make mistakes in terms of thinking, oh you would
ONLY want to embed A in B, until someone figures out that oops, in
*practice* you actual *do* want to embed <ol> or <ul> inside a <p> for
example (from HTML4 DTD).

Like I said, there is a *ton* of such experiences in this space (trying to
write generic DTD/schema languages for generic parsability).

When it comes down to it, the most useful information for a parser/validator
is just to know what are the properties and what are the values.  That's
what XMDP provides. Everything else is incremental on top of that, and often
gets in the way when people use such features as nesting requirements etc.
to *over*-specify.

Thanks,

Tantek