[uf-discuss] Re: Microformats in Google Maps
Toby A Inkster
mail at tobyinkster.co.uk
Thu Aug 2 08:34:04 PDT 2007
Andy Mabbett wrote:
> which strikes me as unworkable, being overly complex and not suitable
> for internationalisation (not just in non-English speaking countries,
> but outside the USA)
I'm with Andy on this one.
In fact, Tantek's proposed algorithm doesn't even solve the problem of
parsing US addresses. Consider:
<div class="org fn">Ambassade de France aux Etats-Unis</div>
4101 Reservior Road, N.W.<br />
Washington D.C. 20007<br />
I recently had to write some code to transfer almost 500,000 addresses
from a loosely formatted list to one which had separate fields for house
name, address, town, county, country and postcode.
Because these were almost entirely UK addresses, and I had a big database
of all UK postal town and corresponding postcodes, I was able to get about
95% accuracy -- but that involved hundreds of lines of code. To cover a
useful number of countries would require tens of thousands of lines of
Requiring the use of heuristics to parse address data raises the barrier to
entry for implementing hCard astronomically.
Andy's suggestion of defaulting to "extended-address" is better, though
given the semantics of "extended-address", which appears to be for flat
numbers, I'd prefer to default to "street-address".
Where "adr" has content not enclosed in any explicit sub-
properties, parsers MAY attempt to heuristically determine
the address parts and, if appropriate, MAY ask the user
to manually separate the address. Failing that, parsers
MUST assume this content to be the "street-address".
Toby A Inkster BSc (Hons) ARCS
[Geek of HTML/SQL/Perl/PHP/Python/Apache/Linux]
[OS: Linux 2.6.12-12mdksmp, up 42 days, 18:43.]
Open Mobile Alliance DTD Oops!
More information about the microformats-discuss