[Skip top navbar]

Andrew Gregory's Web Pages

Rocky outcrop, 28°30'11"S 122°46'31"E

-

Anti-Spam Email Links


Introduction

Spammers are known to scan web pages looking for email addresses to add to their lists (harvest). Putting your unprotected email address on your web pages is inviting spammers to fill your mailbox!

The problem is, visitors like to have a ready-to-click link to send an email.

There are several anti-spam options when it comes to allowing people visiting your web site to contact you:

  1. Just put up an unprotected email address and deal with the spam using filters, blacklists or the like.
  2. Munge (alter) your email address to make it harder for spam harvesters to get your correct address. This runs the risk of people sending their messages to the munged address, so you'll never get it. (Yes, people can be that unobservant!)
  3. Do some simple HTML character entity encoding of your email address. For example, encoding the @-sign with @. Perhaps the mailto: as well:

    <a href="&#109;&#097;&#105;&#108;&#116;&#111;&#058;andrew&#064;scss.com.au">Email me</a>

    This comes out like: Email me. There isn't much point encoding the letters and dots in the address itself, as they look much like any other text you might find on a page, or indeed much like a web link you expect to find in an href attribute. This works quite well, as spammers almost always are looking for an @-sign and don't decode character entities.
  4. Display your email address as a graphic (see my Message Image page for a tool that could be used). However, if someone is browsing with images turned off, or using a text-only browser, they would never see it. Also, for those who can see the image, they are forced to manually copy and enter the address when sending their message. This runs the risk of making mistakes while copying the address.
  5. Use a form and script to send you messages. The problem here is that you rely on the sender to fill in their correct email address so you can reply. It's amazing how often people forget or make a mistake. How can you get back in touch with them then? Nor does the sender have any record of sending the message (although this can be overcome by having the script CC a copy of the message to the sender). Example: Formmail.
  6. Use a server-side script to redirect the web browser to a mailto address. The script may be driven by a regular link, or could be triggered by a standard web page form.
  7. Use some Javascript to obfuscate (v.t. to confuse or obscure) your address from harvesting.

This page deals with the last two options. Firstly, the Javascript one, followed (near the bottom of this page) by the server-side script one.


Javascript Obfuscation

The idea here is to use Javascript to generate a clickable link. The theory goes that spam harvesters are just looking at the HTML markup of your page and are not using Javascript interpreters. Therefore, if you can encode your address into Javascript somehow so that it is no longer recognizable as an email address, the spam harvesters should miss it.

A reasonably well-known encoder is the Automatic Labs Enkoder Form. It produces code like:

<script type="text/javascript"> //<![CDATA[ function hiveware_enkoder(){var i,j,x,y,x= "x=\"0x=\\\"<m068$9-?_q\\\\8<;\\\"AA.o#4?mjsvDx=,C9\\\\\\\"|A=<@oBmA99=A?ml" + "&<P<<o99|B@|2pi2pi'*g~4r;ir9=r~4rkxl=yz?8=xsYjkt9kxm/Al?M54m|A8zn;8m8i-!kt" + "m}/A:.o?9i`&kAo|//-2wy!~Cf1C(:2<88n<9wxvzn7,m<<k88l/~AA#09|2g-?#<p8!1C</l<" + "j}<l<A}2o/<w<7:=;7~41evyfwGsrxvy~C={h9AhCk,nj8k-?\\\\%25\\\"<?!(8<yziEx;y=" + ",A=''x8b(.o8m/#8;x<7<=unpCk8i=|=-29e16?scag:6p/A:=m%25!1Ce(x;<r);:.~4?9h;j" + ",<fon@ir(il<=!C!87i4y={tk0;i<ng<l9xG89696-nx.l/z9en{=;;hyzA88xgtyi<h;i.6/g" + "<7/g=hb+8?}+){A=<j.pj<//A`6=:=x.v<~ch&C~4(A<kAWkarxvyCodl=;<?}<m.A+e<?=At(" + "k{hi+!yz-C-:rk2)-4jx8;i==?=?|Av+-yf(:<:j<3.7/r8-AisAl2qGl)j+!8;=<ii8ewge1~" + "494;<kCy+t--A;=u;vGx=Sshltri==8.i,9i9|-ny{hg.fkuxr?=.oj:6o,n-omC#C=haC8<:j" + "sv}&?,rCnA7ode6Aom6=Biz9<(ep,j)}~<?y\\\";=:|j=e2glvalek4(x.rktchavExrAt,4m" + "(0)<:<);x;--=x.?|Asub8znstrAo|(1)2wy;y=f1='';8<>forwxv(i=,580;iC</<x.#-?le" + "n}A+gth!8;;i+==+=10?js){yv91+=xC~4.su,mAbst4?yr(ii<8,5)6m@;}f|2por(:{hi=5y" + "zi;i<rkxx.llx7eng86=th;?m/i+=A5=10).o2{y+94-=x.!}/sub/8<str>=A(i,|2w5);y<A" + "}y=%25y.sfwxubsv,ltr(ux;j);\";j=eval(x.charAt(0));x=x.substr(1);y='';for(i" + "=0;i<x.length;i+=6){y+=x.substr(i,3);}for(i=3;i<x.length;i+=6){y+=x.substr" + "(i,3);}y=y.substr(j);"; while(x=eval(x));}hiveware_enkoder(); //]]> </script>

It's certainly difficult to read! It's also quite large, and needs to be duplicated everywhere you want your email address to appear. Then there's the maintenance issue - if you want to change an email address, you have to re-encode it, then cut-and-paste onto your page. What a hassle!

And what about people who are, for whatever reason, surfing with Javascript disabled? Several statistics indicate they could be more than 10% of your visitors! The usual method is to include <noscript> tags and describe your email address in some fashion. This results in duplication (of the encoded address with the described address), not to mention the possibility of inconsistencies.


My Solution

My solution is to take advantage of the <noscript> described address. What if you had some method by which that described address could be automatically translated into a clickable address? Then all you would need to do is place the described address on your page, then run some Javascript to automatically un-obfuscate it and turn it into a link!

That's what I've done. All you need to do is include my Javascript module and a common event-handling Javascript module (see below for downloads):

<script type="text/javascript" src="events.js"></script> <script type="text/javascript" src="emaillinks.js"></script>

and place the required markup onto your page:

<p>Send your email to:<a class="email">&quot;Andrew Gregory&quot; &lt;andrew at scss dot com dot au&gt;</a></p>

Which gets turned into:

Send your email to:

if you have Javascript enabled, and:

Send your email to:"Andrew Gregory" <andrew at scss dot com dot au>

if you haven't.


Required Files

Creative Commons License
This work is licensed under a Creative Commons License.


How It Works

Including the required Javascript files sets up an "onload" event handler for the web page, with the un-obfuscator configuration set to some useful defaults. When the page is loaded, the Javascript (if enabled) will run and search-and-replace the email addresses with links.

The email addresses are located by searching for <a> elements with the configured class name (default: "email"). Elements with an href already defined are skipped.

Once located, the destination name and address is extracted from the text content of the element, or, if the element contains an image, from the alt attribute. The name and address are found using regular expressions defined in the configuration (default: name is inside double-quotes, address is inside angle brackets). The name is optional, the address is not.

The address, which should be like "andrew at scss dot com dot au", then has a sequence of regular expression replacements applied to it. The default replacements convert " at " to "@", " dot " to ".", and remove any trailing "invalid" text (which may have an optional dot separating it from the rest of the domain).

After all that, the element link text is replaced and the href set to the email address.

At this point the process function is called to perform any last-minute processing that might be required.


Customization

It's possible spam harvesters could learn new ways of extracting email addresses from web pages. To counter this, I've made my script extremely flexible in it's method of operation. If you're handy with regular expressions and a bit of Javascript you can change my script's method of operation entirely.

Regular Expressions

Below are some links about regular expressions as they are implemented in Javascript:

Default Configuration

The email un-obfuscator script uses an object to hold all the necessary configuration information. The default one is:

var emaillinks_config={ className:'email', addr:/<([^>]*)>/, name:/"([^"]*)"/, subj:/with subject "([^"]*)"/, process:[emaillinks_subject], unobs:[ {re:/\s+at\s+/ig , txt:'@'}, {re:/\s+dot\s+/ig, txt:'.'}, {re:/\s+-at-\s+/ig , txt:'@'}, {re:/\s+-dot-\s+/ig, txt:'.'}, {re:/\s+\(at\)\s+/ig , txt:'@'}, {re:/\s+\(dot\)\s+/ig, txt:'.'}, {re:/[\.]?invalid$/i, txt:''}, {re:/\s+/g, txt:''} ] };

You can replace it entirely (by setting a new value for emaillinks_config), or you can replace parts of it (for example, emaillinks_config.className='mailtolink';.

Setting Customizations

There are two methods of setting your customizations:

  1. Include your customizations inline
  2. Place your customizations in an external file

Inline Customizations

Your code is in your web page code somewhere after you include the source files:

<script type="text/javascript" src="events.js"></script> <script type="text/javascript" src="emaillinks.js"></script> <script type="text/javascript"> emaillinks_config.className='mailtolink'; </script>

External File Customizations

Your code is in an external file referenced somewhere after you include the source files:

<script type="text/javascript" src="events.js"></script> <script type="text/javascript" src="emaillinks.js"></script> <script type="text/javascript" src="custom-code.js"></script>

where "custom-code.js" contains things like:

emaillinks_config.className='mailtolink';

Configuration Properties

The un-obfuscator script recognizes the following configuration object properties:

addr

This is the regular expression used to extract the obfuscated email address from the anchor content. If no name is present in the anchor content, the un-obfuscated version of this text is used as the link text. Default: "/<([^>]*)>/" (everything between two angle brackets).

className

This is the name of the class used to mark <a> elements as containing an obfuscated email address. Default: "email".

name

This is the regular expression used to extract the name of the email recipient from the anchor content. If present, this is used as the link text. Default: "/"([^"]*)"/" (everything between two double-quotes).

process

This is an Array of functions called when the un-obfuscator has built the link, but has not yet inserted it into the document, nor removed the original anchor. Each function is called with two parameters:

  1. A reference to the object representing the original anchor element.
  2. A reference to the new link element.

The intention is that you could perform some extra very fancy processing yourself, should you find that necessary, without needing to modify my original code.

This defaults to emaillinks_subject, a function that demonstrates how to use this facility, and which supports the useful feature of adding a subject to the email link address.

subj

This is the regular expression used to extract the email subject (if any) from the anchor content. Default: everything between two double quotes following the text "with subject".

unobs

An array of Objects, each with a regular expression property (re) and replacement text property (txt). The array is processed in order (first to last). At each step, the regular expression is applied to the obfuscated address text and every match replaced with the replacement text. The final result of the address text after being so processed is assumed to be the un-obfuscated email address.


Suggestions

See also: Address Munging FAQ.

Rather than stick with the defaults, make it a little more difficult for the spam harvesters by customizing the script operation!

Different Class Name

Set the class name to something different, like "mailtolink" or "address".

emaillinks_config.className='mailtolink';

Different Name/Address Regular Expressions

Use different delimiters for the parts of the email. For example, square brackets to go around the obfuscated address:

emaillinks_config.addr=/\[([^\]]*)\]/,

Different Un-Obfuscator Replacements

Instead of looking for " at " and " dot ", look for " -at- " or " -dot- ":

emaillinks_config.unobs=[ {re:/\s+-at-\s+/ig , txt:'@'}, {re:/\s+-dot-\s+/ig, txt:'.'} ];

Bogus Domain Name

This is best shown by example. Note in particular, that this example also demonstrates that text not recognized as either the recipient name or email address will be ignored by the script. Code your address like:

<a class="email">&quot;Andrew Gregory&quot; &lt;andrew at bogus dot scss dot com dot au invalid&gt; (with subject &quot;Feedback&quot;) (remove the bogus parts of the domain name before sending)</a>

And modify the standard configuration like (appends a new replacement object to the existing default ones):

emaillinks_config.unobs[emaillinks_config.unobs.length]={re:/@bogus./, txt:'@'};

Which turns out like:

with Javascript, and:

"Andrew Gregory" <andrew at bogus dot scss dot com dot au invalid> (with subject "Feedback") (remove the bogus parts of the domain name before sending)

without.

Different Case

You could write your email name and domain using lowercase letters, and write the "@" and "." using uppercase letters. For example, "andrewATscssDOTcomDOTau". A suitable configuration might be:

emaillinks_config.unobs=[{re:/AT/g,txt:'@'},{re:/DOT/g,txt:'.'}];


How This Technique Could Be Defeated

The only way I believe this technique could be defeated would be by harvesting software implementing a complete Javascript and DOM interpreter. By running every script on every page, then scanning through the resulting document objects, such a system could easily find anchor tags and the decoded href attributes.

This isn't necessarily difficult as open source browsers (such as Mozilla and Konqueror) provide a ready-to-go engine. All the spammers would need to do is create a modified version of the browser that can spider automatically.

What might stop this technique from being practical is that all the extra processing would significantly slow down harvesting.


CGI Scripting

I got this idea from A New Form of Spam Protection: If you're able to, you can use a server-side script instead of a client-side script. Here is a suitable Perl script (tested on Apache servers):

#!/usr/bin/perl -w my %q = split(/[=&]/, $ENV{'QUERY_STRING'}); print 'Status: 307 Moved Temporarily', "\n"; print 'Location: mailto:', $q{'name'}, '%40', $q{'domain'}; my $c = '?'; foreach ('cc', 'bcc', 'subject', 'body') { if ($q{$_}) { print "$c$_=$q{$_}"; $c = '&'; } } print "\n\n";

and some suitable PHP:

<?php if (isset($_GET["name"]) && isset($_GET["domain"])) { $loc = $_GET["name"] . "@" . $_GET["domain"]; $args = array("subject", "cc", "bcc", "body"); $ch = "?"; do { $value = current($args); if (isset($_GET[$value])) { $loc .= $ch . $value . "=" . $_GET[$value]; $ch = "&"; } } while (next($args)); header("Location: mailto:" . $loc); } ?>

You may, of course, hard-code any of the parameters (domain being the obvious one). Note also that these scripts are not spam relays because they don't actually send the email - they rely on the user-agent (browser) to do that.

You use the script by creating a link to it:

<a href="mailto.pl?name=andrew&amp;domain=scss.com.au&amp;subject=Feedback">Email me</a> <a href="mailto.php?name=andrew&amp;domain=scss.com.au&amp;subject=Feedback">Email me</a>

Email me

or a form:

<form action="/cgi-bin/mailto.pl"> <fieldset> <input type="hidden" name="name" value="andrew" /> <input type="hidden" name="domain" value="scss.com.au" /> <input type="hidden" name="subject" value="Feedback" /> <input type="submit" value="Email me" /> </fieldset> </form>

Clicking on the link/button executes the script, which redirects the browser to a "mailto" address, which the browser should interpret as an email.

Of course, these methods can also be defeated by spam harvesters using a web browser engine, but now they'd have to submit every form they encounter to see if they get a mailto address.

Combining Server- and Client-Side Scripts

This is easily done using the following processing function:

emaillinks_config.process.push(function(orig,link) { var href = link.getAttribute('href'); href = href.replace(/^mailto:/, 'mailto.pl?name='); href = href.replace(/@/, '&domain='); link.setAttribute('href', href); });


Feedback

Of course, I'm happy to receive feedback and suggestions on this script, page, or any other aspect of this web site. Follow the "Contact Me" in the footer of the page.


Version History

Explorer Tree Version History
VersionDateDescription
1.82007-04-25
  • Fixed classname matching to use a regular expression. Prevents a configured classname like 'email' from matching classes like 'treemails'.
1.72006-09-07
  • Enhanced to support image links. Processing function has changed: the first parameter is no longer the original element, but is now the original text.
n/a2004-11-21
  • Added a couple more honeypot addresses.
  • Moved each honeypot address onto its own page.
  • Only one spam detected since the last one, still to the unprotected address.
n/a2004-10-22
  • Only four days for the first spam to the unprotected address!
n/a2004-10-18
  • Added an unprotected 'honeypot' address, since the others failed to attract anything!
n/a2004-10-08
  • Modified events.js to work around Konqueror bug.
1.62004-10-08
  • Fixed bug preventing operation in Konqueror.
1.52004-08-10
  • Fixed bug where Gecko browsers would not show the decoded links.
1.42004-05-31
  • Modified to use an array of processing functions.
1.32004-05-29
  • Fixed whitespace removal regular expression.
  • Moved the subject regular expression into the configuration object.
1.22004-05-28
  • Altered default regular expressions to be case insensitive.
  • Fixed default "at" and "dot" expressions to require spaces around the word.
  • Added extra "at" and "dot" variants to the defaults.
  • Added "invalid" removal.
  • Added whitespace removal.
1.12004-05-28
  • Altered to use anchor elements instead of spans. This allows the full range of anchor attributes to be used, instead of being limited to the much smaller selection available to spans.
1.02004-05-28
  • Initial version.

Honeypots

All the below are links to web pages, each page with a single email link. The different pages use different anti-spam techniques. The address is named to indicate which technique is used.

I originally had all the links on this page, but because I got so few spams (even to the unprotected address), that I thought perhaps there were too many email addresses on this page and the harvesters were calling it a spam trap. Maybe just having one address per page will be better.

Please don't use them to send me email! Instead, follow the "Contact Me" link at the bottom of this page.


-