title attribute and abbreviated class names
(Was:[uf-discuss]Currency Quickpoll: Preliminary results)
Scott Reynen
scott at randomchaos.com
Sat Oct 14 19:39:09 PDT 2006
On Oct 14, 2006, at 3:27 PM, Mike Schinkel wrote:
>>> Your examples seem to leave a lot of ambiguity about what things
>>> mean,
>
> I'm new to proposing microformats, so I clearly have a lot to
> learn, but
> that said I don't see where what I was proposing was ambiguous. Can
> you give
> me explicit examples where allowing default assumptions for the
> most common
> use cases will by necessity lead to ambiguity? It seems to me that
> either
> something will be specified or if not it will default? That seems non
> ambiguous to me. Am I wrong?
I'm not entirely sure we're talking about the same thing anymore,
after reading this exchange:
On Oct 14, 2006, at 3:55 PM, Mike Schinkel wrote:
>>> That said, why not make the "symbol" markup optional?
>
> That's IMO is an additional good idea.
I thought that was basically what you were advocating, but you called
it an /additional/ good idea, so I'm not sure what it's an addition
to. I thought what you suggested was to allow for explicit
differentiation between the currency identifier and the amount, but
in certain cases where such differentiation can be made by matching a
regular expression, allow for markup without explicit
differentiation, leaving the differentiation implicitly to the parser
to figure out. For example, this would be valid:
本が<span class="money"><abbr class="amount" title="1000">一千</
abbr><abbr class="currency" title="JPY">円</abbr></span>
because it doesn't fit the pattern you suggested, but this would also
be valid:
The book is <span class="money">$5.99</span>.
because it does follow the pattern, where everything that's not
within a certain character group is considered a currency symbol
(i.e. "$"). If this isn't what you're suggesting, then I'm not clear
on what you're suggesting.
But if this is what you're suggesting, I think you're underestimating
the complexity involved in defining which characters might be part of
an amount and which characters might be part of a currency symbol. I
do a lot of parsing via regular expressions and a large part of my
interest in microformats comes from witnessing the failure rate in
such parsing. There's always another unexpected format popping up
and before you know it, the regular expression is a page long. See
this page for a list of regular expressions for identifying the
information that needs to be parsed from currency values for a quick
taste:
http://regexlib.com/Search.aspx?k=currency
And those are all defining legitimate input much more strictly than
would be appropriate for the web at large.
To specifically answer your question of what doesn't work with [A-Za-
z0-9], there's the decimal point, which is part of the amount rather
than the currency symbol, and there's any commas, which are also part
of the amount rather than the currency symbol, and any whitespace
characters (of which there are many) shouldn't be considered part of
the amount nor the currency symbol. That's all I can think of right
now, but I have no doubt there's much more I haven't thought of, and
it's that much more I'm worried about. So if we come up with a
definition that includes all of that, now we're talking about
explaining to authors that they can only leave out the currency
markup if their class="money" tag is only containing letters,
numbers, decimal points, commas, and whitespace. Otherwise they have
to explicitly identify the individual parts.
I think this is already more confusing than just always identifying
the individual parts, I think it's still likely to cause problems,
and I think it's only helping a slight majority that is quickly
becoming a minority. English language web pages only comprise about
55% of the web today, and that percent is quickly shrinking. So I'm
publishing my currency in English, and you're trying to ease my
implementation burden, so I don't have to explicitly define my
currency symbol and parsers will just figure it out for me. What if
I want my whitespace to be marked up with HTML entities? E.g.:
The book costs <span class="money">$ 5.99</span>
That's not an unlikely scenario. I actually publish currency values
like that, when someone wants a space to separate the $ from the
amount, but they don't want the two getting split onto separate
lines. Are we going to include that in the regular expression too or
do I need to explicitly identify my symbol? If it's not allowed, how
will that be explained clearly enough that I won't make this mistake
and wind up with my currency symbol wrongly interpreted as "$ ",
which doesn't map to any known currency, and will lose my space if
it's replaced by another currency symbol? This is the kind of
ambiguity that doesn't really help publishers. And if it is in the
regular expression, how are we going to explain to publishers that
it's okay? Looks like unnecessary complication to me.
> But one final point on this; has this been discussed this with those
> who make the decisions for markup used at the largest sites:
> Google, eBay,
> Amazon, etc.? Just curious? (and I don't mean to push this, it's
> just that
> being pedantic is in my nature, unfortunately. :)
There are people from Yahoo! on this list, and Technorati's pretty
big too, so they'd be good people to say whether or not they really
care how long the class names are.
Peace,
Scott
More information about the microformats-discuss
mailing list