[uf-discuss] Re: Microformats in Google Maps
Tantek Ç elik
tantek at cs.stanford.edu
Thu Aug 2 09:24:09 PDT 2007
On 8/2/07 8:34 AM, "Toby A Inkster" <mail at tobyinkster.co.uk> wrote:
> Andy Mabbett wrote:
>
>> <http://microformats.org/wiki/hcard-brainstorming#implied_adr_subproperties>
>> which strikes me as unworkable, being overly complex and not suitable
>> for internationalisation (not just in non-English speaking countries,
>> but outside the USA)
>
> I'm with Andy on this one.
To be clear, I wanted to document it as a brainstorm to be critiqued, with
severe doubts myself, from the second paragraph in that section, which I
wrote:
"This may also be too difficult/complex to be dependable or interoperable,
but it is worth at least documenting our considerations and analysis either
way."
In general, the documentation of such "strawman" thoughts and criticisms of
such is just good science. Not every brainstorm should be taken as a
proposal that is intended to be adopted.
> <div xml:lang="fr">
Please add examples that show problems with it to the section with the
brainstorm rather than the emails list. And no need to try to be
comprehensive about showing problems with it, one or two examples will do
for now, given the doubts expressed from the origin.
> I recently had to write some code to transfer almost 500,000 addresses
> from a loosely formatted list to one which had separate fields for house
> name, address, town, county, country and postcode.
>
> Because these were almost entirely UK addresses, and I had a big database
> of all UK postal town and corresponding postcodes, I was able to get about
> 95% accuracy -- but that involved hundreds of lines of code. To cover a
> useful number of countries would require tens of thousands of lines of
> code.
This is a useful datapoint. Note that it doesn't prove difficulty (in that
someone else may be able to write simpler/more efficient code, or not), but
any such implementation experience is useful to capture.
> Requiring the use of heuristics to parse address data raises the barrier to
> entry for implementing hCard astronomically.
Perhaps not "astronomically", but I agree with your sentiment. ;)
> Andy's suggestion of defaulting to "extended-address" is better, though
> given the semantics of "extended-address", which appears to be for flat
> numbers, I'd prefer to default to "street-address".
I'd prefer neither. I think there would be too much semantic dilution (or
artificial semantic precision) by doing so (putting things that don't have a
certain semantic into a field that implies that semantic).
> How about:
>
> Where "adr" has content not enclosed in any explicit sub-
> properties, parsers MAY attempt to heuristically determine
> the address parts and, if appropriate, MAY ask the user
> to manually separate the address. Failing that, parsers
> MUST assume this content to be the "street-address".
I'm not even sure about permitting the heuristic part.
I think for now the simplest and most interoperable (and what I think
implementations already do) is to make this an FAQ (because the spec already
doesn't say to do anything with adr without any subproperty):
http://microformats.org/wiki/hcard-brainstorming#adr_without_children_FAQ
Thanks,
Tantek
More information about the microformats-discuss
mailing list