[uf-discuss] Human and machine readable data format

Sat Jul 12 11:39:44 PDT 2008

+1 for class="data-"
Hidden metadata isn't going away anytime soon. HTML 5 features it,
RDF/RDFa uses it, the empty abbr pattern already does it, and many
others.

Best,

Zach Carter


On Sat, Jul 12, 2008 at 1:23 PM, Jason Karns <jason.karns at gmail.com> wrote:
>> The premise that publishers will pick any old format is merely an
>> assertion with no evidence. Please show us an example somewhere else
>> where this has happened, or perhaps a better argument than merely
>> insisting on the "obvious" truth of it.
>>
>> The way I see it, if they publish in the wrong format, then the
>> parsers won't pick up the date. This is what happens with microformats
>> already. I don't know about anyone else, but when I publish a
>> microformat, I test whether parsers can read it correctly. I do the
>> same thing with any html. If a publisher can't take the time to test,
>> and publish in the correct format then they take the consequences.
>> it's exactly the same with any other technology. Why should
>> microformats be any different? Why do you think making a microformat
>> resemble natural language drastically changes this set of rules?
>>
> The problem is as simple as testing in a parser to verify that the
> format is correct.  NLP is too difficult to easily solved in every
> parser.  The outcome will be that different parsers will handle
> different levels of NLP, parsing only subsets of accepted 'native
> language formats'. This is similar to the way many parsers are now.
> (Many parsers handle different portions of the specs. Few handle the
> entire spec. Case in point: the include pattern.)  Even assuming the
> very extreme case that all parsers handle the same string formats, no
> parser will ever handle every possible language permutation.
>
> The only solution that will result in practical parser use will
> *require* some amount of data duplication.  Just as you stated:
> 1. metadata and information hiding is out of the question
> 2. putting ISO 8601 style dates ("machine dates") in any place where a
> human can see it or have it read to them  is "the problem" that we are
> trying to solve, so we can't do that.
> 3. The date cannot resemble anything a human might want to read.
>
> One of the above rules must be broken. #2 is the problem as you said.
> #3 will result in a 'spec' that will never be fully implemented in all
> parsers and will thus never be practical for publishing. #1 therefore
> must be broken.  I don't understand why this is even an argument at
> this point. The abbr-pattern was already accepted though it violates
> this principle. The only reason it is rejected now is because of the
> semantics of the @title attribute. Thus any solution that violates
> principle #1 in the same way as the abbr-pattern should also be
> acceptable so long as it does not suffer the same accessibility issue.
>
> Any sort of class="data-*" solution seems to be an acceptable
> compromise (and a compromise is what is required). It keeps the data
> machine-readable without making parsing impractical. It keeps the
> machine data out of human-readable context (@title). And it keeps the
> duplicate data near the human-readable version for maintenance.
> (Though I take exception with the duplicate-data principle as most
> publishers use automated tools that easily duplicate data without
> causing stale-issues.)
>
> ~Jason
> _______________________________________________
> microformats-discuss mailing list
> microformats-discuss at microformats.org
> http://microformats.org/mailman/listinfo/microformats-discuss
>