[uf-dev] Proper use of value

Manu Sporny msporny at digitalbazaar.com
Tue Apr 22 19:00:07 PDT 2008


Toby A Inkster wrote:
> Unfortunately for some microformats, the parser *needs* to know about
> white space. The example which springs to mind is N-optimisation in
> hCard. 

Hmm... That's not evident to me. I understand your point, and it's
certainly valid - but there's a nuance.

To say that the parser "*needs* to know about whitespace" is different
from saying that "we should preserve the original whitespace". We can
have both.

My previous post stated differently could read:

"As a general rule, we should preserve any and all whitespace in the
parser model. Only when the information is displayed or exported from
the parser model should we canonicalize whitespace, and only when it
makes sense to do so."

> This:
> 
>     <span class="fn">JohnDoe</span>
> 
> is parsed as:
> 
>     FN:JohnDoe
>     NICKNAME:JohnDoe
> 
> Whereas this:
> 
>     <span class="fn">John Doe</span>
> 
> is parsed as:
> 
>     FN:John Doe
>     N:Doe;John
> 
> In RDF terms, the white space in the object literal effects the choice
> of predicate. So it is important to know how white space should be
> interpreted, at least in some situations.

I don't think the above is a good example. I'm racking my brain to come
up with a reason to canonicalize whitespace in the parser. I don't think
throwing away the original stuff buys us anything. For example:

   <span class="fn">  John     Doe   </span>
   <span class="fn">John Doe</span>

Both of the above would parse to:

   FN:John Doe
   N:Doe;John

However, I think the proper thing to give the developer back when they
ask for the contents of FN should be "  John     Doe   ".

The application can then make the decision to canonicalize the
whitespace when a) displaying it in an interface or b) exporting it to
another format, such as VCARD.

As far as the example you gave above... I would expect that the hCard
optimization step would be performed after the parser acquired all of
the data from the page. FN would contain "  John     Doe   ", and thus
the N-optimization would trim all whitespace, split the string and
encode it as "Doe;John". In other words, N-optimization is a
post-processing step performed after the parser-proper runs.

-- manu

-- 
Manu Sporny
President/CEO - Digital Bazaar, Inc.
blog: RDFa Basics in 8 minutes (video)
http://blog.digitalbazaar.com/2008/01/07/rdfa-basics/


More information about the microformats-dev mailing list