Value Excerption Pattern Parsing (was: [uf-dev] How do we (want to) document parsing?)

Thu Jun 12 02:36:52 PDT 2008

On 12 Jun 2008, at 09:21, Brian Suda wrote:

> On Thu, Jun 12, 2008 at 12:58 AM, Scott Reynen  
> <scott at randomchaos.com> wrote:
>> http://ben-ward.co.uk/microformats/value-excerption-pattern/ValueExcerptionParseFlowChart.png
>
> --- i had a look at the flow chart and found a few things that i think
> should be fixed and a few that i disagree with.

Disagreement is fine and very welcome. This is all draft, in progress  
work :-) Fundamentally, I'm keen to establish _how_ we represent this  
sort of process going forward, with the complete understanding that  
the detail of this current diagram can and will change.

> (maybe we should number these nodes so it is easier to reference?)

Could do, although http://microformats.org/wiki/value-excerption-pattern-issues 
  provides numbering of sorts so perhaps refer to those for now?

> 1) I don't think values should be concatenated with a unicode char
> 0020 (a space). If there was intention to add white-space then those
> should be part of the value. We should not introduce additional
> information that was not explicitly marked-up.

The open issue is: http://microformats.org/wiki/value-excerption-pattern-issues#White-space_behaviour_when_concatenating_value_nodes 
.

Seems reasonable. The default case I was thinking of at the time was  
actually somewhat muddled with concatenating repeat properties: e.g.  
additional-name properties in hCard, which would want to be space- 
separated.

For value, I now lean toward agreeing with you, in so far as  
regardless of number of segments, we're still marking up a single µf  
property, rather than multiple occurrences of the same µf property.

> 2) If the value contains no inner-text, then use the @title. I think
> this was a proposal, but until we get more feedback it probably should
> not be part of our paring rules. What would be the semantics in that?
> I know this is an attempt at a worker-a-round, but i don't think it
> should be included in these parsing rules until we discuss it further.

This (http://microformats.org/wiki/value-excerption-pattern-issues#Parsing_title_from_Empty_value_Elements 
) is the open issue I'm currently working on, and building the diagram  
was development exercise to clarify how it could be parsed.

The semantics are a little tricky, because we're working with the fact  
that HTML does not have a native means of doing this. I think it's  
definable, though, so will have a go later.

> TIDY still has bugs (or maybe it is a feature) with empty nodes.

It does, and for some reason dropping empty elements is not a feature  
that can be switched off at the command line like other behaviours.

However, I've found it's trivial to compile a version of tidy with  
‘don't drop empty elements with class names‘ behaviour added, and will  
submit it as a patch when I get time. That said, even without the  
patch making it back into the Tidy trunk any time soon, the fact is  
that Tidy can be made to work with an empty element technique.

I've documented that on the -issues page (http://microformats.org/wiki/value-excerption-pattern-issues#Parsing_title_from_Empty_value_Elements 
).

> Also, i don´t know if this chart can handle or should handle nested
> values? did we make a decision that nested value properties were to be
> ignored?

The reaction was negative, and you pointed out that from a publisher  
point-of-view nesting value in value was unnecessary; there's seems to  
be no reason to do it. So (http://microformats.org/wiki/value-excerption-pattern-issues#Nested_value 
) I closed that issue and intend that we spec the pattern not to act  
recursively.

> Great work Ben, this is much easier for people to understand than a
> series of bullet points.

That's my intention. I think there's a lot of potential to explore  
better ways of documenting parsing rules.

B