Value Excerption Pattern Parsing (was: [uf-dev] How do we (want to)
document parsing?)
Ben Ward
lists at ben-ward.co.uk
Thu Jun 12 02:36:52 PDT 2008
On 12 Jun 2008, at 09:21, Brian Suda wrote:
> On Thu, Jun 12, 2008 at 12:58 AM, Scott Reynen
> <scott at randomchaos.com> wrote:
>> http://ben-ward.co.uk/microformats/value-excerption-pattern/ValueExcerptionParseFlowChart.png
>
> --- i had a look at the flow chart and found a few things that i think
> should be fixed and a few that i disagree with.
Disagreement is fine and very welcome. This is all draft, in progress
work :-) Fundamentally, I'm keen to establish _how_ we represent this
sort of process going forward, with the complete understanding that
the detail of this current diagram can and will change.
> (maybe we should number these nodes so it is easier to reference?)
Could do, although http://microformats.org/wiki/value-excerption-pattern-issues
provides numbering of sorts so perhaps refer to those for now?
> 1) I don't think values should be concatenated with a unicode char
> 0020 (a space). If there was intention to add white-space then those
> should be part of the value. We should not introduce additional
> information that was not explicitly marked-up.
The open issue is: http://microformats.org/wiki/value-excerption-pattern-issues#White-space_behaviour_when_concatenating_value_nodes
.
Seems reasonable. The default case I was thinking of at the time was
actually somewhat muddled with concatenating repeat properties: e.g.
additional-name properties in hCard, which would want to be space-
separated.
For value, I now lean toward agreeing with you, in so far as
regardless of number of segments, we're still marking up a single µf
property, rather than multiple occurrences of the same µf property.
> 2) If the value contains no inner-text, then use the @title. I think
> this was a proposal, but until we get more feedback it probably should
> not be part of our paring rules. What would be the semantics in that?
> I know this is an attempt at a worker-a-round, but i don't think it
> should be included in these parsing rules until we discuss it further.
This (http://microformats.org/wiki/value-excerption-pattern-issues#Parsing_title_from_Empty_value_Elements
) is the open issue I'm currently working on, and building the diagram
was development exercise to clarify how it could be parsed.
The semantics are a little tricky, because we're working with the fact
that HTML does not have a native means of doing this. I think it's
definable, though, so will have a go later.
> TIDY still has bugs (or maybe it is a feature) with empty nodes.
It does, and for some reason dropping empty elements is not a feature
that can be switched off at the command line like other behaviours.
However, I've found it's trivial to compile a version of tidy with
‘don't drop empty elements with class names‘ behaviour added, and will
submit it as a patch when I get time. That said, even without the
patch making it back into the Tidy trunk any time soon, the fact is
that Tidy can be made to work with an empty element technique.
I've documented that on the -issues page (http://microformats.org/wiki/value-excerption-pattern-issues#Parsing_title_from_Empty_value_Elements
).
> Also, i don´t know if this chart can handle or should handle nested
> values? did we make a decision that nested value properties were to be
> ignored?
The reaction was negative, and you pointed out that from a publisher
point-of-view nesting value in value was unnecessary; there's seems to
be no reason to do it. So (http://microformats.org/wiki/value-excerption-pattern-issues#Nested_value
) I closed that issue and intend that we spec the pattern not to act
recursively.
> Great work Ben, this is much easier for people to understand than a
> series of bullet points.
That's my intention. I think there's a lot of potential to explore
better ways of documenting parsing rules.
B
More information about the microformats-dev
mailing list