[microformats-discuss] hCard and spam
rbach at rbach.priv.at
Tue Jul 19 18:48:21 PDT 2005
brian suda wrote:
> After you posted the first message, i check X2V to see what would
> happen. Even with your 'encoding' which is just the decimal value of the
> character, the XSLT transformation function i am using (the one built
> into PHP) converts all of your decimal escape sequence to the actual
> letter. So X2V will actually extract your obfuscated email address just
> fine. I would double check to make sure, but it works fine on my end.
X2V is using XSLT, therfore it interprets XHTML files as XML documents,
but most (or at least some) email spiders/email address collection
programs are not fully-fledged (X)HTML interrepreters the simply see
(X)HTML files as simple text files.
Three years ago I recived lots of spam, so I did an experiment.
I downloaded some email spiders and tested them on a simple HTML page.
I can't remember the details, but I guess I should repeat my experiment
to see how email spiders have developed.
The HTML looked like this:
1. <a href="mailto:mail1 at example.org">mail1 at example.org</a>.
2. <a href="mailto:mail2[at]example.org">mail2[at]example.org</a>.
3. mail3 at example.org
4. <!-- mail4 at example.org -->
Every spider found mail1 at example.org.
Some spiders found mail2[at]example.org.
Some other spiders found mail3 at example.org and mail4 at example.org.
No spider found mail5[at]example.org
So my conlusion was:
- Some programs simply search for "mailto:"
- Some programs search for X at Y (X and Y being some regular expressions)
So I used a technique which was already documented on the Internet.
I replaced "mailto:" with "mailto:" and "@" with "@", and the
spiders it used where not able to find my email addresses.
This allowed me to protect email addresses from spiders *and* to be
still compatible with every browser.
As said, I'm gonna to repeat my experiment and I wouldn't wonder if
there were spiders which could also decode HTML.
Robert Bachmann <rbach at rbach.priv.at> (OpenPGP KeyID: 0x4A5CCF10)
More information about the microformats-discuss