hcard-issues: Difference between revisions

From Microformats Wiki
Jump to navigation Jump to search
(added interesting issue regarding utf-8 space characters in tel fields)
m (→‎2009: minor grammar fail fixed)
Line 23: Line 23:
{{OpenIssue}} <span class="entry-summary author vcard"><span class="published">2009-09-10</span> raised by <span class="fn">[[User:TomMorris|TomMorris]]</span></span>
{{OpenIssue}} <span class="entry-summary author vcard"><span class="published">2009-09-10</span> raised by <span class="fn">[[User:TomMorris|TomMorris]]</span></span>
<div class="entry-content discussion issues">
<div class="entry-content discussion issues">
* <strong class="entry-title">Parsing UTF-8 'special' space characters in telephone fields</strong>. I recently designed a page that used an hCard with a <code>span</code> containing the <samp>tel</samp> value. To space the phone number appropriately, I used the U+8201 (THIN SPACE) character <code>&amp;#8201;</code>. [[operator|Operator's]] hCard parser coughed up on this and refused to read both the contents of the <samp>tel</samp> <code>span</code> but also an <code>a</code> element containing the <samp>email</samp> property that was contained in the parent <code>p</code> element. I cannot find a clear definition of what is acceptable content for the <samp>tel</samp> property. There seems to be two ways of resolving this: (1) instruct authors of microformat parsing libraries to normalise the Unicode characters U+8194 (EN SPACE), U+8195 (EM SPACE), U+8196 (THREE-PER-EM SPACE), U+8197 (FOUR-PER-EM SPACE), U+8198 (SIX-PER-EM SPACE), U+8199 (FIGURE SPACE), U+8200 (PUNCTUATION SPACE), U+8201 (THIN SPACE), U+8202 (HAIR SPACE), U+8203 (ZERO WIDTH SPACE) and other similar characters (including the HTML entities <code>&amp;ensp;</code>, <code>&amp;emsp;</code>, <code>&amp;thinsp;</code> and <code>&amp;nbsp;</code>) so that, for the purpose of parsing the microformat they are treated as a standard space (U+0020) or (2) instructing microformat authors to not use anything other space character than U+0020 (and to use the [http://www.w3.org/TR/CSS21/text.html#spacing-props CSS <code>word-spacing</code> property] for manipulating <samp>tel</samp> properties). Which of these is best? Anyone want to survey the current implementations to see how they currently parse telephone numbers with unexpected characters in them? Part of the problem may be that there doesn't seem to be any clear specification of what a valid telephone number is, so far as I can see (which is just reading [http://www.ietf.org/rfc/rfc2426.txt RFC 2426] §3.3, which says that it should simply be the telephone format defined in X.500 ([http://www.ietf.org/rfc/rfc1274.txt RFC 1274]), but there is no explicit definition in X.500 except to say it is "telephoneNumberSyntax", which is defined in [http://www.ietf.org/rfc/rfc1778.txt RFC 1778] §2.16 as simply being a printable string. Perhaps I've gone off down a blind alley! (Do we need a page for character encoding/Unicode issues?)
* <strong class="entry-title">Parsing UTF-8 'special' space characters in telephone fields</strong>. I recently designed a page that used an hCard with a <code>span</code> containing the <samp>tel</samp> value. To space the phone number appropriately, I used the U+8201 (THIN SPACE) character <code>&amp;#8201;</code>. [[operator|Operator's]] hCard parser coughed up on this and refused to read both the contents of the <samp>tel</samp> <code>span</code> but also an <code>a</code> element containing the <samp>email</samp> property that was contained in the parent <code>p</code> element. I cannot find a clear definition of what is acceptable content for the <samp>tel</samp> property. There seems to be two ways of resolving this: (1) instruct authors of microformat parsing libraries to normalise the Unicode characters U+8194 (EN SPACE), U+8195 (EM SPACE), U+8196 (THREE-PER-EM SPACE), U+8197 (FOUR-PER-EM SPACE), U+8198 (SIX-PER-EM SPACE), U+8199 (FIGURE SPACE), U+8200 (PUNCTUATION SPACE), U+8201 (THIN SPACE), U+8202 (HAIR SPACE), U+8203 (ZERO WIDTH SPACE) and other similar characters (including the HTML entities <code>&amp;ensp;</code>, <code>&amp;emsp;</code>, <code>&amp;thinsp;</code> and <code>&amp;nbsp;</code>) so that, for the purpose of parsing the microformat they are treated as a standard space (U+0020) or (2) instructing microformat authors to not use any other space character than U+0020 (and to use the [http://www.w3.org/TR/CSS21/text.html#spacing-props CSS <code>word-spacing</code> property] for manipulating <samp>tel</samp> properties). Which of these is best? Anyone want to survey the current implementations to see how they currently parse telephone numbers with unexpected characters in them? Part of the problem may be that there doesn't seem to be any clear specification of what a valid telephone number is, so far as I can see (which is just reading [http://www.ietf.org/rfc/rfc2426.txt RFC 2426] §3.3, which says that it should simply be the telephone format defined in X.500 ([http://www.ietf.org/rfc/rfc1274.txt RFC 1274]), but there is no explicit definition in X.500 except to say it is "telephoneNumberSyntax", which is defined in [http://www.ietf.org/rfc/rfc1778.txt RFC 1778] §2.16 as simply being a printable string. Perhaps I've gone off down a blind alley! (Do we need a page for character encoding/Unicode issues?)
</div>
</div>
</div>
</div>

Revision as of 20:50, 10 September 2009

<entry-title> hCard issues </entry-title>

These are externally raised issues about hCard with broadly varying degrees of merit. Thus some issues are REJECTED for a number of obvious reasons (but still documented here in case they are re-raised), and others contain longer discussions. Some issues may be ACCEPTED and perhaps cause changes or improved explanations in the spec.

IMPORTANT: Please read the hCard FAQ and the hCard resolved issues before giving any feedback or raising any issues as your feedback/issues may already be resolved/answered.

Submitted issues may (and probably will) be edited and rewritten for better terseness, clarity, calmness, rationality, and as neutral a point of view as possible. Write your issues well. — Tantek

For matters relating to the vCard specification itself, see vcard-errata and vcard-suggestions.

closed issues

See: hcard-issues-closed

resolved issues

See: hcard-issues-resolved

issues

Please add new issues to the bottom of the list by copy and pasting the template. Please follow-up to resolved/rejected issues with new information rather than resubmitting such issues. Duplicate issue additions will be reverted.

2009

open issue! 2009-09-10 raised by TomMorris

  • Parsing UTF-8 'special' space characters in telephone fields. I recently designed a page that used an hCard with a span containing the tel value. To space the phone number appropriately, I used the U+8201 (THIN SPACE) character &#8201;. Operator's hCard parser coughed up on this and refused to read both the contents of the tel span but also an a element containing the email property that was contained in the parent p element. I cannot find a clear definition of what is acceptable content for the tel property. There seems to be two ways of resolving this: (1) instruct authors of microformat parsing libraries to normalise the Unicode characters U+8194 (EN SPACE), U+8195 (EM SPACE), U+8196 (THREE-PER-EM SPACE), U+8197 (FOUR-PER-EM SPACE), U+8198 (SIX-PER-EM SPACE), U+8199 (FIGURE SPACE), U+8200 (PUNCTUATION SPACE), U+8201 (THIN SPACE), U+8202 (HAIR SPACE), U+8203 (ZERO WIDTH SPACE) and other similar characters (including the HTML entities &ensp;, &emsp;, &thinsp; and &nbsp;) so that, for the purpose of parsing the microformat they are treated as a standard space (U+0020) or (2) instructing microformat authors to not use any other space character than U+0020 (and to use the CSS word-spacing property for manipulating tel properties). Which of these is best? Anyone want to survey the current implementations to see how they currently parse telephone numbers with unexpected characters in them? Part of the problem may be that there doesn't seem to be any clear specification of what a valid telephone number is, so far as I can see (which is just reading RFC 2426 §3.3, which says that it should simply be the telephone format defined in X.500 (RFC 1274), but there is no explicit definition in X.500 except to say it is "telephoneNumberSyntax", which is defined in RFC 1778 §2.16 as simply being a printable string. Perhaps I've gone off down a blind alley! (Do we need a page for character encoding/Unicode issues?)

template

Consider using this format (copy and paste this to the end of the list to add your issues; replace ~~~ with an external link if preferred) to report issues or feedback, so that issues can show up in hAtom subscriptions of this issues page. If open issues lack this markup, please add it.

Please post one issue per entry, to make them easier to manage. Avoid combining multiple issues into single reports, as this can confuse or muddle feedback, and puts a burden of separating the discrete issues onto someone else who 1. may not have the time, and 2. may not understand the issue in the same way as the original reporter.

<div class="hentry">
{{OpenIssue}} 
<span class="entry-summary author vcard">
 <span class="published">2011-MM-DD</span> 
 raised by <span class="fn">~~~</span>
</span>
<div class="entry-content discussion issues">
* <strong class="entry-title">«Short title of issue»</strong>. «Description of Issue»
** Follow-up comment #1
** Follow-up comment #2
</div>
</div>

related pages

The hCard specification is a work in progress. As additional aspects are discussed, understood, and written, they will be added. These thoughts, issues, and questions are kept in separate pages.