hcard-parsing-brainstorming

Jump to: navigation, search

Contents

Brainstorming for hCard parsing

See separate hCard parsing page for current hCard parsing rules; and hcard-brainstorming for more general discussion.

Add thoughts/proposals to improve/add to hCard parsing here in this section in hCard brainstorming, and be sure to include URLs to examples of hCards in the wild which could benefit from parsing rule changes.

Additional Semantic HTML handling

acronym element handling

Choices:

input element handling

In hcard-parsing, I've defined special-case handling for several elements according to more semantic exceptions, e.g. textual properties on the img element use the 'alt' attribute.

One element I forgot at the time was the input element, specifically, <input type="text">. Another I forgot was the textarea element.

The simple suggestion is to add the following to hcard-parsing, specifically to the all properties sub-section:

Tantek

forms auto-fill

If you go to a site that needs your contact info for something, say an ecommerce site for checkout, and if the form fields are marked up with hCard semantics per the above, then perhaps we could consider having that mean "insert hCard here".

Interactive useragents (e.g. operator on firefox) could detect such "insert hCard here" semantics in forms on pages, and let you "pre-fill" with *your* hCard info, and then all of a sudden we have a standard for forms auto-fill, rather than all the hacks that have gone into browsers since 1999 (starting with IE4.5/Mac, the first to do forms auto-fill of an entire form with a single button press - not just auto-complete of each form field individually).

Obviously this would make sense to build into *existing* forms auto-fill features in Firefox and IE, and any other browsers that support it.

This way new sites could simply conform to the standard, rather than depend on hacks which parse label values etc. and imply things and get them wrong sometimes.

i18n advantages: hCard annotated form inputs would also be more international, thus avoiding the need for each browser to guess what is the "name" and "telephone" field in every language, so they can do forms auto-fill on any site regardless of language, not just English.

Tantek 16:24, 23 Jul 2007 (PDT)

input examples

See hcard-input-examples for research on examples of contact info input forms.

By specifying a consistent way to markup contact info (person or venue/organization) input forms, we could enable both:

blog posts on hCard forms fill

For more on this, see the following blog posts:

related implementations

background discussion

Key threads:


Somewhat related:

One key summary by Ciaran McNulty:

The options discussed in a hypothetical hCard input system from that post:

option new vcard input root class

1) create a new root class other than vcard to indicate a form that's fillable with hCard data.

Proposed markup:

<form class="vcard-input" ...>
   <fieldset class="fn">
      <input type="text" class="given-name" name="first_name" />
      <input type="text" class="family-name" name="last_name" />
   </fieldset>
   ...
</form>
  • -1 I think it is preferable to try to make hCard work with existing classes for this user scenario rather than adding another scenario-specific class name. Adding scenario-specific class names also does not scale to other microformats in general (requiring additional class names for each microformat). Tantek 19:17, 8 June 2009 (UTC)

option add input elements to hCard parsing

2) extend hCard's parsing rules to cover form elements and relying on the FORM/INPUT semantics to indicate that stuff is inputtable.

Proposed markup:

<form ...>
<div class="vcard">
   <fieldset class="fn">
      <input type="text" class="given-name" name="first_name" value="Rob" />
      <input type="text" class="family-name" name="last_name" value="Manson" />
   </fieldset>
   ...
</div>
<div class="vcard">
   <fieldset class="fn">
      <input type="text" class="given-name" name="first_name" value="Scott" />
      <input type="text" class="family-name" name="last_name" value="Reynen" />
   </fieldset>
   ...
</div>
</form>

See discussion points for more details and follow-up on benefits / drawbacks.

forms auto fill for all microformats

Broader question:

discussion points

Many raised by RobManson.

  • Extending parsing rules to extract value attributes from <input type="text|hidden"> fields
    • -1 (unattributed, perhaps rhetorical) : this require adding a bit of special case to existing parsers to handle these elements
    • +1 (unattributed, perhaps rhetorical) : this could help to enable microformat based auto form filling
    • +1 The parsing rules for forms elements must be specified anyway, and thus it makes sense to see if they can be specified in such a way to at least enable forms autofill functionality. Tantek 19:17, 8 June 2009 (UTC)
  • Existing server side and client side scripts use non-hCard field names so class is the most seamless extension point
    • +1 (unattributed, perhaps rhetorical) : this is in line with the current parsing model
    • +1 Tantek 19:17, 8 June 2009 (UTC)
  • Some parsers (e.g. X2V) only parse the loaded html not the dynamic DOM (Operator parsers the page DOM).
    • -1 (unattributed, perhaps rhetorical) : parser doesn't pickup any updated form data after the page has loaded, e.g. even though textarea appears to parse ok - it's only ever the initially loaded value that can be exported.
    • +1 hcard-parsing should provide additional guidance on page load parsing vs dynamic DOM handling as necessary to handle both types of implementations. Tantek 19:17, 8 June 2009 (UTC)
  • Forms may contain more than one hCard so using <form class="vcard"> should not be required.
    • +1 (unattributed, perhaps rhetorical) : this minimizes the changes to current parsing rules
    • +1 For example a <fieldset> could be used by an author instead, or even a div between the form and the inputs. Tantek 19:17, 8 June 2009 (UTC)
  • Empty values should be ignored when extracting hCards
    • +1 for vCards at least, perhaps into JSON as well. Tantek 19:17, 8 June 2009 (UTC)
  • hCards with all empty values should be ignored when listing/extracting hCards
    • +1 for vCards at least, perhaps into JSON as well. Tantek 19:17, 8 June 2009 (UTC)


Which form elements should be supported beyond input fields?

  • title select that lists mr/mrs/ms/dr/etc.
    • +1 honorific-prefix in particular, yes. Tantek 19:17, 8 June 2009 (UTC)
  • checkboxes to choose which addresses to use
    • +0 not sure how to make this work without a specific example to analyze. Tantek 19:17, 8 June 2009 (UTC)
  • Option : simplify extension to only support input fields and recommend that select's, radio buttons and checkboxes update related hidden input fields with simple javascript (e.g. onChange/Click="this.form.elements[this.className].value = this.value")
    • -1 (unattributed, perhaps rhetorical) Unworkable. Cannot require clientside javascript.
    • +1 (unattributed, perhaps rhetorical) this would simplify parsing and server side form processing as only single input fields for each value need to be used/validated
    • -1 (unattributed, perhaps rhetorical) hCard forms then require javascript if they use form elements other than basic <input type="text|hidden">
    • +0 (unattributed, perhaps rhetorical)  : either way any auto form filling will be more complex beyond simple <input type="text|hidden"> fields
      • -1 (unattributed, perhaps rhetorical) hypothetical comment assuming more complexity beyond.
    • -1 requiring javascript is a non-starter. microformats must work as POSH. Tantek 19:17, 8 June 2009 (UTC)

multiple type parsing

fax and modem hyperlink parsing

For the "tel" property in particular, when the element is:

Ambiguous name components

When automatically publishing hCards from pre-existing data, it's not necessarily possible to tell which words in a name map to which hCard properties. When the structure of a name is unknown, it is hard to ensure an automatically published hCard remains valid.

There's currently no easy answer to this.

One implementation suggestion is a 'best-guess' algorithm, something along the lines of:

  1. If the name is one word, attempt implied nickname optimization
  2. If the name is two words, attempt implied n optimization
  3. For three or more words
    1. Perform a lookup against known sub-name combinations (e.g. 'Sarah Jane', 'Vander Wal')
    2. Apply the grammar "given-name additional-name(s) family-name"

The principal behind this suggestion is that it's better to make a good guess and potentially miscategorize an ambiguous name component than to generate an invalid hCard.

ADR with no children

Parsers (Operator, Tails, Almost Universal Microformat Parser) currently expect adr to have one or more sub-properties. It is not clear from the hCard spec that that's mandatory (though the vCard RFC requires it); nor is it always possible for an address field in a templated (or CMS) web site to be defined with such granularity.

Consider Wikipedia, whose templates often have a "locale" or "place" field, used, for example, on these articles about railway stations:

Likewise, the Wikipedia template for organisations, in which a "headquarters" address (for a business, for example) may contain a full or partial postal address, or just a city/county or city/country pair:

implied single adr subproperty

I propose that, where adr has content, but no explicit sub-properties, there should be a default sub-property to which that content is allocated, in order that it is captured by user agents, and can later be manually tweaked (in, say, an address book programme) by users if so desired. This would satisfy the vCard requirement for child-of-adr, and adhere to the general principle to "be strict in what you send but generous in what you receive".

Of the available sub-property options:

I suggest that "extended-address" is the most sensible sub-property to use, for this purpose. Andy Mabbett 03:57, 26 Mar 2007 (PDT)

implied adr subproperties

It may be possible for parsers to parse out adr subproperties from a contiguous adr string. This would be an optimization for both adr and hCard.

This may also be too difficult/complex to be dependable or interoperable, but it is worth at least documenting our considerations and analysis either way.

Examples:

IBM's Employee Directory search returns hCards with the "adr" property which contain the "locality" and "country-name" data but unfortunately without being marked up as such, e.g.:

<td class="adr">Austin, USA</td>

We could first define a canonical ordering of how to parse for comma (and perhaps in some cases space) separated adr subproperties within an adr string e.g.:

Given a dictionary of country names and abbreviations, it may be feasible to parse for a country name at the end of the adr string, and then apply country/locale specific parsing rules to the rest of the adr string.

E.g.

The above heuristic (not quite well specified enough to be an algorithm, yet) would allow parsing of the IBM Employee Directory result documented above.

There are a lot of existing geocoder APIs that turn unstructured addresses into structured ones - we should examine these for patterns and best practices. eg Google's geocoder geopy calls multiple ones

adr without children FAQ

I think for now the simplest and most interoperable (and what I think implementations already do) is to make this an FAQ (because the spec already doesn't say to do anything with adr without any subproperty)

Q: What should a parser do with an "adr" property lacking any subproperties?

A: A parser SHOULD do nothing with such an "adr" property. A parser MAY provide the text content of such an "adr" property in the results of its parsing as a freeform value of the "adr" property. Note that the vCard standard does not allow for any such freeform value of its "adr" property (in vCard the "adr" property MUST be structured) and thus that MAY suggestion to parsers only applies in situations (such as APIs, JSON return values) where it is possible to return a freeform value for the "adr" property.

Tantek 09:20, 2 Aug 2007 (PDT)


tel parsing

Some nice to haves (parser related only in that they may require additional parsing related code)

See also

The hCard specification is a work in progress. As additional aspects are discussed, understood, and written, they will be added. These thoughts, issues, and questions are kept in separate pages.

hcard-parsing-brainstorming was last modified: Saturday, July 24th, 2010

Views