[uf-new] img alt content statistics

Benjamin Hawkes-Lewis bhawkeslewis at googlemail.com
Sat Jul 14 15:52:57 PDT 2007


I'm increasingly sceptical about non-qualitative statistical exercises 
of this sort. They need to be interpreted with great caution. For 
example, alt="" may be compliant with the (X)HTML specifications, or it 
may not be. You just can't tell without looking at the page in question.

I'm not sure why mass use or abuse of @alt, treating all webpages as 
equals, is deterministic for hCard parsing. Doesn't there need to be a 
subsample containing only pages with markup that would be interpreted by 
a microformat parser as an hCard?

--
Benjamin Hawkes-Lewis

Manu Sporny wrote:
> Andy Mabbett wrote:
>> In message <4698F9F1.1060409 at digitalbazaar.com>, Manu Sporny
>> <msporny at digitalbazaar.com> writes
>>
>>> The percentages below are the percentages of img tags that contained
>>> non-empty attributes:
>>>
>>> src:    99%
>>> height: 66%
>>> width:  66%
>>> alt:    41%
>>> title:   5%
>>> id:      4%
>>>
>>> In general, only 41% of 'img' tags list non-empty 'alt' attributes. In
>>> other words - most websites are not using 'alt' attributes for 'img'
>>> tags.
>> That's a bogus conclusion - empty "alt" attributes are perfectly valid,
>> and are appropriate in many cases; and you're counting tags but making
>> conclusions about "most websites".
> 
> I agree with you, Andy... it seems my statement wasn't clear. Perhaps it
> should have read:
> 
> "In other words - most websites are using empty 'alt' attributes."
> 
> or
> 
> "59% of most websites are complying with the HTML 4.01 specification
> regarding usage of 'alt' with image tags."
> 
> I used the terminology "most websites" because the data gathered is,
> statistically speaking, overkill. Assuming 125,626,329 websites (per
> Netcraft) we would need a sample set of 384 websites to get a 95%
> confidence level with an interval of 5%.
> 
> So, we needed 384 samples - we got 224,671 across 14,077 websites.
> 
> If you want to sift through the data yourself, I'll have it up tomorrow.
> I'll also be providing all of the source code to crawl, index and
> analyze the data.
> 
> -- manu
> _______________________________________________
> microformats-new mailing list
> microformats-new at microformats.org
> http://microformats.org/mailman/listinfo/microformats-new
> 



More information about the microformats-new mailing list