[microformats-discuss] hCard updates: RFC2426 examples converted to hCard examples, hcard-parsing ready for review.

Emiliano Martinez Luque martinezluque at gmail.com
Sun Sep 11 21:22:00 PDT 2005


I have the following question. What is a parser supposed to do in
situations like the following:

<span class="country-name">United States <small>of</small> America</span>

<span class="country-name">Rep&uacute;blica del <strong>Paraguay</strong></span>

<span class="country-name" style="color:gray;">People´s Republic of
<span style="color:black;">China</span></span>

Is this type of markup allowed in the hCard definition? If not then
disregard the rest of this message. If it is then I would like to add
the following to the discussion on Issue 1 of the hcard-parsing
document.

Consider that for a parser example 1 is:

Start Element span
Text Node
Start Element small
Text Node
End Element small
Text Node
End Element span

If the hCard definition allows this type of markup, then the value of
"country-name" would be the text node of the element and that of the
text nodes of it´s sub-nodes.

This is consistent with the hcard-parsing document statement that:

"Once an element for a property is found, the contents of the element
are used for the value."

But, and this is the problem, that statement becomes ambiguos in the
plural names example for "categories":

<ul class="categories">
 <li>INTERNET</li>
 <li>IETF</li>
 <li>INDUSTRY</li>
 <li>INFORMATION TECHNOLOGY</li>
</ul>

In which every subsequent text node takes a different category value.
I understand that from the vCard RFC this should be considered only
one value separated by Commas. But conceptually it is a set (or group)
of different category values. What I mean is that for example 1, the
value of "country-name" is "United States of America", but for
"categories" there is a set of values "INTERNET", "IETF", "INDUSTRY",
"INFORMATION TECHNOLOGY".

I see that something similar has been addressed in Issue 1 of hCard-parsing. 

This can be solved either by specifying that: 

"Once an element for a property is found, the contents of the element
are used for the value or values."

And specifying that the "categories" property is to take each of it´s
sub-nodes elements as a different value.

But this would still raise the issue of what is a parser to do in a
situation like this:

<div class="categories">
<span>Internet</span>
<span>INFORMATION <span style="color:red;">TECHNOLOGY</span></span>
</div>

Which again can be addressed by specifying that the "categories"
property is to take each of it´s first generation (is this the right
term?) children elements values as a different value.

Or it can be (more simply) addressed by specifying that category may
take more than one value and going for the singular name approach
considered in Issue 1:

<ul>
 <li class="category">INTERNET</li>
 <li class="category">IETF</li>
 <li class="category">INDUSTRY</li>
 <li class="category">INFORMATION TECHNOLOGY</li>
</ul>

which will also allow for more complex markup inside of it. ie:

<div>
<span class="category">Internet</span>
<span class="category">INFORMATION <span
style="color:red;">TECHNOLOGY</span></span>
</div>


This will still be consistent with:

"Once an element for a property is found, the contents of the element
are used for the value."

Since every sub-node of an element is part of the element contents.

This is are my thoughts regarding Issue 1, hope it helps.

Other than that I have not found this on any document, if it has been
stated somewhere, please disregard. Anyhow it´s self evident but I
guess it should be stated:

. The Element Values Charset should be considered that of the
Container Page or Document.
. The Element Values Language should be considered that of the
Container Page or Document (If Stated).

Thank you,
Emiliano Martínez Luque
TheThingsIWant.com


More information about the microformats-discuss mailing list