- Parsing UTF-8 'special' space characters in telephone fields. I recently designed a page that used an hCard with a
span
containing the tel value. To space the phone number appropriately, I used the U+8201 (THIN SPACE) character 
. Operator's hCard parser coughed up on this and refused to read both the contents of the telspan
but also ana
element containing the email property that was contained in the parentp
element. I cannot find a clear definition of what is acceptable content for the tel property. There seems to be two ways of resolving this: (1) instruct authors of microformat parsing libraries to normalise the Unicode characters U+8194 (EN SPACE), U+8195 (EM SPACE), U+8196 (THREE-PER-EM SPACE), U+8197 (FOUR-PER-EM SPACE), U+8198 (SIX-PER-EM SPACE), U+8199 (FIGURE SPACE), U+8200 (PUNCTUATION SPACE), U+8201 (THIN SPACE), U+8202 (HAIR SPACE), U+8203 (ZERO WIDTH SPACE) and other similar characters (including the HTML entities 
, 
, 
and
) so that, for the purpose of parsing the microformat they are treated as a standard space (U+0020) or (2) instructing microformat authors to not use any other space character than U+0020 (and to use the CSSword-spacing
property for manipulating tel properties). Which of these is best? Anyone want to survey the current implementations to see how they currently parse telephone numbers with unexpected characters in them? Part of the problem may be that there doesn't seem to be any clear specification of what a valid telephone number is, so far as I can see (which is just reading RFC 2426 §3.3, which says that it should simply be the telephone format defined in X.500 (RFC 1274), but there is no explicit definition in X.500 except to say it is "telephoneNumberSyntax", which is defined in RFC 1778 §2.16 as simply being a printable string. Perhaps I've gone off down a blind alley! (Do we need a page for character encoding/Unicode issues?)
hcard-issues: Difference between revisions
(added interesting issue regarding utf-8 space characters in tel fields) |
m (→2009: minor grammar fail fixed) |
||
Line 23: | Line 23: | ||
{{OpenIssue}} <span class="entry-summary author vcard"><span class="published">2009-09-10</span> raised by <span class="fn">[[User:TomMorris|TomMorris]]</span></span> | {{OpenIssue}} <span class="entry-summary author vcard"><span class="published">2009-09-10</span> raised by <span class="fn">[[User:TomMorris|TomMorris]]</span></span> | ||
<div class="entry-content discussion issues"> | <div class="entry-content discussion issues"> | ||
* <strong class="entry-title">Parsing UTF-8 'special' space characters in telephone fields</strong>. I recently designed a page that used an hCard with a <code>span</code> containing the <samp>tel</samp> value. To space the phone number appropriately, I used the U+8201 (THIN SPACE) character <code>&#8201;</code>. [[operator|Operator's]] hCard parser coughed up on this and refused to read both the contents of the <samp>tel</samp> <code>span</code> but also an <code>a</code> element containing the <samp>email</samp> property that was contained in the parent <code>p</code> element. I cannot find a clear definition of what is acceptable content for the <samp>tel</samp> property. There seems to be two ways of resolving this: (1) instruct authors of microformat parsing libraries to normalise the Unicode characters U+8194 (EN SPACE), U+8195 (EM SPACE), U+8196 (THREE-PER-EM SPACE), U+8197 (FOUR-PER-EM SPACE), U+8198 (SIX-PER-EM SPACE), U+8199 (FIGURE SPACE), U+8200 (PUNCTUATION SPACE), U+8201 (THIN SPACE), U+8202 (HAIR SPACE), U+8203 (ZERO WIDTH SPACE) and other similar characters (including the HTML entities <code>&ensp;</code>, <code>&emsp;</code>, <code>&thinsp;</code> and <code>&nbsp;</code>) so that, for the purpose of parsing the microformat they are treated as a standard space (U+0020) or (2) instructing microformat authors to not use | * <strong class="entry-title">Parsing UTF-8 'special' space characters in telephone fields</strong>. I recently designed a page that used an hCard with a <code>span</code> containing the <samp>tel</samp> value. To space the phone number appropriately, I used the U+8201 (THIN SPACE) character <code>&#8201;</code>. [[operator|Operator's]] hCard parser coughed up on this and refused to read both the contents of the <samp>tel</samp> <code>span</code> but also an <code>a</code> element containing the <samp>email</samp> property that was contained in the parent <code>p</code> element. I cannot find a clear definition of what is acceptable content for the <samp>tel</samp> property. There seems to be two ways of resolving this: (1) instruct authors of microformat parsing libraries to normalise the Unicode characters U+8194 (EN SPACE), U+8195 (EM SPACE), U+8196 (THREE-PER-EM SPACE), U+8197 (FOUR-PER-EM SPACE), U+8198 (SIX-PER-EM SPACE), U+8199 (FIGURE SPACE), U+8200 (PUNCTUATION SPACE), U+8201 (THIN SPACE), U+8202 (HAIR SPACE), U+8203 (ZERO WIDTH SPACE) and other similar characters (including the HTML entities <code>&ensp;</code>, <code>&emsp;</code>, <code>&thinsp;</code> and <code>&nbsp;</code>) so that, for the purpose of parsing the microformat they are treated as a standard space (U+0020) or (2) instructing microformat authors to not use any other space character than U+0020 (and to use the [http://www.w3.org/TR/CSS21/text.html#spacing-props CSS <code>word-spacing</code> property] for manipulating <samp>tel</samp> properties). Which of these is best? Anyone want to survey the current implementations to see how they currently parse telephone numbers with unexpected characters in them? Part of the problem may be that there doesn't seem to be any clear specification of what a valid telephone number is, so far as I can see (which is just reading [http://www.ietf.org/rfc/rfc2426.txt RFC 2426] §3.3, which says that it should simply be the telephone format defined in X.500 ([http://www.ietf.org/rfc/rfc1274.txt RFC 1274]), but there is no explicit definition in X.500 except to say it is "telephoneNumberSyntax", which is defined in [http://www.ietf.org/rfc/rfc1778.txt RFC 1778] §2.16 as simply being a printable string. Perhaps I've gone off down a blind alley! (Do we need a page for character encoding/Unicode issues?) | ||
</div> | </div> | ||
</div> | </div> |
Revision as of 20:50, 10 September 2009
<entry-title> hCard issues </entry-title>
These are externally raised issues about hCard with broadly varying degrees of merit. Thus some issues are REJECTED for a number of obvious reasons (but still documented here in case they are re-raised), and others contain longer discussions. Some issues may be ACCEPTED and perhaps cause changes or improved explanations in the spec.
IMPORTANT: Please read the hCard FAQ and the hCard resolved issues before giving any feedback or raising any issues as your feedback/issues may already be resolved/answered.
Submitted issues may (and probably will) be edited and rewritten for better terseness, clarity, calmness, rationality, and as neutral a point of view as possible. Write your issues well. — Tantek
For matters relating to the vCard specification itself, see vcard-errata and vcard-suggestions.
closed issues
See: hcard-issues-closed
resolved issues
issues
Please add new issues to the bottom of the list by copy and pasting the template. Please follow-up to resolved/rejected issues with new information rather than resubmitting such issues. Duplicate issue additions will be reverted.
2009
open issue!
template
Consider using this format (copy and paste this to the end of the list to add your issues; replace ~~~ with an external link if preferred) to report issues or feedback, so that issues can show up in hAtom subscriptions of this issues page. If open issues lack this markup, please add it.
Please post one issue per entry, to make them easier to manage. Avoid combining multiple issues into single reports, as this can confuse or muddle feedback, and puts a burden of separating the discrete issues onto someone else who 1. may not have the time, and 2. may not understand the issue in the same way as the original reporter.
<div class="hentry">
{{OpenIssue}}
<span class="entry-summary author vcard">
<span class="published">2011-MM-DD</span>
raised by <span class="fn">~~~</span>
</span>
<div class="entry-content discussion issues">
* <strong class="entry-title">«Short title of issue»</strong>. «Description of Issue»
** Follow-up comment #1
** Follow-up comment #2
</div>
</div>
- hCard
- hCard cheatsheet - hCard properties
- hCard creator (feedback) - create your own hCard.
- hCard authoring - learn how to add hCard markup to your existing contact info.
- hCard examples - example usage of various classes within hCard.
- hCard examples in the wild - an on-going list of websites which use hCards.
- hcard-supporting-user-profiles - sites with user profiles marked up with hCard - a very common example.
- hCard FAQ - if you have any questions about hCard, check here.
- hCard implementations - websites or tools which either generate or parse hCards.
- hCard parsing - normative details of how to parse hCards.
- hCards and pages - semantic distinctions between different hCards on a page, and how to identify each
- hcard-user-interface - techniques and issues surrounding user-interfaces to author, publish, and display hCards.
- hCard profile - the XMDP profile for hCard
- hCard singular properties - an explanation of the list of singular properties in hCard.
- hCard tests - a wiki page with actual embedded hCards to try parsing.
- hCard advocacy - encourage others to use hCard
- hCard "to do" - jobs to do
The hCard specification is a work in progress. As additional aspects are discussed, understood, and written, they will be added. These thoughts, issues, and questions are kept in separate pages.
- hCard brainstorming - brainstorms and other explorations relating to hCard.
- hcard-parsing-brainstorming - brainstorming specific to parsing of hCard
- geo brainstorming
- hCard feedback - general feedback (as opposed to specific issues).
- hCard issues - specific issues with the specification.
- vCard errata - corrections to the vCard specification, which underlies hCard.
- vCard suggestions - suggested improvements to the vCard specification.