[uf-new] img alt content statistics

Derrick Lyndon Pallas derrick at pallas.us
Sat Jul 14 13:42:18 PDT 2007


Manu Sporny wrote:
> "59% of most websites are complying with the HTML 4.01 specification
> regarding usage of 'alt' with image tags."
>
> I used the terminology "most websites" because the data gathered is,
> statistically speaking, overkill. Assuming 125,626,329 websites (per
> Netcraft) we would need a sample set of 384 websites to get a 95%
> confidence level with an interval of 5%.
>
> So, we needed 384 samples - we got 224,671 across 14,077 websites.

That's assuming that any given page from a website is representative of 
that website. What you really want are examples of <img/> usage on the 
web; the number of samples you need is based on usages/page * 
pages/unique site * unique sites/internet.

For what it's worth, I actually did start an analysis but haven't had 
time to do much with the data. I took a random chunk of our archive, 
looked for every <a/>, storing the content of the anchor so I could look 
for lonely <img/>s with @alt text.

The proof run found 1.4M <a/> on 14k pages. Of these anchors,

  * 240k contain at least one <img/>
  * 228k start with an <img/>
  * 152k contain at least one <img/> with an @alt
  * 121k contain at least one <img/> with a non-empty @alt
  * 25k contain at least one <img/> with a @title
  * 24k contain at least one <img/> with a non-empty @title

A total of 247k <img/> were found in anchors. Of these images,

  * 151k contain an @alt
  * 120k contain a non-empty @alt
  * 25k contain a @title
  * 23k contain a non-empty @title
  * 11k have a garbage phrase (e.g. "click here", "use the right mouse 
button to save", etc.) in @alt or @title

Of the 228k starting <img/>s,

  * 142k contain an @alt
  * 114k contain a non-empty @alt
  * 24k contain a @title
  * 22k contain a non-empty @title
  * 11k have a garbage phrase in @alt or @title

The non-proof run is looking at 50x as many pages. All of this was 
gleaned from the services at <http://tinyurl.com/23czqt> ~ Derrick Pallas



More information about the microformats-new mailing list