[uf-new] Use of img in rel-* (with analyzed data)
msporny at digitalbazaar.com
Sun Jul 15 11:09:26 PDT 2007
I'm starting a new thread as the "*img alt content*" discussion seems to
be getting unfocused. Please familiarize yourself with the following
thread, as this discussion is a more focused continuation of it:
All of the tools and data that were used for this analysis, including
source code released under the GPL, is available from the following URL:
It is quite often that a site uses an image instead of a text link to
present actions. For example: Instead of using the text "Download", they
will use a graphic image with a downward-facing arrow pointing at a disk.
In other words, if we have this:
How do we present this option to a human being in a non-web-page UI?
This problem is applicable to any 'rel-*' pattern. Currently, it is
affecting the implementation of hAudio because Operator does not extract
ALT or TITLE attributes for IMG tags, thus when an image-only rel-* link
is presented to the user, it is blank.
The Argument Thus Far
Andy Mabbett proposed that Operator should use the ALT attribute from
the IMG tag, as that is HTML/XHTML compliant. Tantek Çelik raised the
point that web authors often mis-use the ALT attribute. Scott Reynen
noted that we would need examples to more accurately make an informed
decision, as no data had been collected as of yet.
The Data Collected So Far
The first set of data collected attempted to determine the number of IMG
tags that used 'alt', 'title' and 'id':
Total websites crawled : 14077
Total img tags analyzed: 224671
The second set of data collected came from Derrick Pallas. We are still
waiting for analysis to be performed by him and that analysis posted to
the mailing list.
The third set of data collected looks at image-only anchors. In other
words, it collects only links that look like the following:
<a href="http://www.example.com"><img src="example.png" /></a>
The data was analyzed by a human being to ensure that the ALT text
matched the image. The following criteria was used to categorize images:
Valid @alt - If the ALT text displayed to the user matched the image
displayed, the image was marked as VALID. The ALT text was also marked
as valid if it was blank.
Unknown @alt - If the ALT text was in another language or was in UTF-8
(not displayable), the image was marked as UNKNOWN.
Garbage @alt - If the ALT text was clearly not applicable to the image,
such as "click here", "red ball", or "blog" when the image was a
shopping cart, etc.
This analysis required human interaction, thus the sample size is small
(but still statistically significant). A small GUI displayed an image to
a person and asked them to select if the image matched the ALT tag. This
is the first time this data is being presented:
Total websites crawled : 1721
Total img-only anchors analyzed: 1166
Valid @alt : 77.3%
Unknown @alt: 5.8%
Garbage @alt: 16.9%
As mentioned previously, all of the tools and data that were used for
this analysis, including source code, is available from the following URL:
More information about the microformats-new