[microformats-discuss] hCard updates: RFC2426 examples converted to hCard examples, hcard-parsing ready for review.

brian suda brian.suda at gmail.com
Sun Sep 11 21:44:27 PDT 2005

Currently, X2V will take the entire value of the nodes for each
property, so for your examples:

<span class="country-name">United States <small>of</small> America</span>

The resulting string would be: "United States of America"

This is evident in the FN/N extraction,
<span class="n fn"><span class="given-name">Brian</span> <span class="family-name">Suda</span></span>

FN: Brian Suda

The FN is extracted without care of child nodes, while N is extracted
specifically for each child node. So i think parsers should take the
entire node-value, including any child-node values.

As for categories, that is being addresses with ISSUE #1 by making
categories into category (singular), so for each category you must now
mark them-up explicitly. You are correct with your examples, and it will
allow for additional mark-up within the class="category" element.

Issue #2 deals with a away to prevent a parser from extracting the FULL
node-value, for example.

<span class="country-name">People´s Republic of <span class="value">China</span></span>

You could wrap a portion of the string with class="value" that way you
have additional text that is to be ignored by parsers. (for country-name
this is not very relevant, but for TEL and EMAIL it is).

For example:
<span class="tel">my <span class="type">home</span> phone is: <span

Since, HOME is nested under class="tel" (to associate it wil the tel)
there needs to be a way to extract only the phone number value instead
of getting the whole node-value string. Class="value" does this.

I hope this answers your question. You are correct in thinking that the
entire string value should be used even if there is additional mark-up
nested inside it. This is the default senario, but we have also provided
away to extract specific data which must be explicit through the
class="value" property.

You also ask if your examples are valid hCard mark-up with all the
additional style="" attributes and nested elements. It should be said
that microformats are built within HTML, so microformats should NOT
restrict what you can do in HTML.


Emiliano Martinez Luque wrote:

>I have the following question. What is a parser supposed to do in
>situations like the following:
><span class="country-name">United States <small>of</small> America</span>
><span class="country-name">Rep&uacute;blica del <strong>Paraguay</strong></span>
><span class="country-name" style="color:gray;">People´s Republic of
><span style="color:black;">China</span></span>
>Is this type of markup allowed in the hCard definition? If not then
>disregard the rest of this message. If it is then I would like to add
>the following to the discussion on Issue 1 of the hcard-parsing
>Consider that for a parser example 1 is:
>Start Element span
>Text Node
>Start Element small
>Text Node
>End Element small
>Text Node
>End Element span
>If the hCard definition allows this type of markup, then the value of
>"country-name" would be the text node of the element and that of the
>text nodes of it´s sub-nodes.
>This is consistent with the hcard-parsing document statement that:
>"Once an element for a property is found, the contents of the element
>are used for the value."
>But, and this is the problem, that statement becomes ambiguos in the
>plural names example for "categories":
><ul class="categories">
> <li>INTERNET</li>
> <li>IETF</li>
> <li>INDUSTRY</li>
>In which every subsequent text node takes a different category value.
>I understand that from the vCard RFC this should be considered only
>one value separated by Commas. But conceptually it is a set (or group)
>of different category values. What I mean is that for example 1, the
>value of "country-name" is "United States of America", but for
>"categories" there is a set of values "INTERNET", "IETF", "INDUSTRY",
>I see that something similar has been addressed in Issue 1 of hCard-parsing. 
>This can be solved either by specifying that: 
>"Once an element for a property is found, the contents of the element
>are used for the value or values."
>And specifying that the "categories" property is to take each of it´s
>sub-nodes elements as a different value.
>But this would still raise the issue of what is a parser to do in a
>situation like this:
><div class="categories">
><span>INFORMATION <span style="color:red;">TECHNOLOGY</span></span>
>Which again can be addressed by specifying that the "categories"
>property is to take each of it´s first generation (is this the right
>term?) children elements values as a different value.
>Or it can be (more simply) addressed by specifying that category may
>take more than one value and going for the singular name approach
>considered in Issue 1:
> <li class="category">INTERNET</li>
> <li class="category">IETF</li>
> <li class="category">INDUSTRY</li>
> <li class="category">INFORMATION TECHNOLOGY</li>
>which will also allow for more complex markup inside of it. ie:
><span class="category">Internet</span>
><span class="category">INFORMATION <span
>This will still be consistent with:
>"Once an element for a property is found, the contents of the element
>are used for the value."
>Since every sub-node of an element is part of the element contents.
>This is are my thoughts regarding Issue 1, hope it helps.
>Other than that I have not found this on any document, if it has been
>stated somewhere, please disregard. Anyhow it´s self evident but I
>guess it should be stated:
>. The Element Values Charset should be considered that of the
>Container Page or Document.
>. The Element Values Language should be considered that of the
>Container Page or Document (If Stated).
>Thank you,
>Emiliano Martínez Luque
>microformats-discuss mailing list
>microformats-discuss at microformats.org

More information about the microformats-discuss mailing list