With a few simple steps I reduced the spam in my inbox from 150 per day to less than one per week. Here’s how.
I use IMAP hosted with Pair Networks, but the discussion that follows should be broadly applicable to any remote or local mail server. For purposes of illustration, let’s assume we have the domain “example.com”, our hosting account’s username is “account”, and the mailbox we will be working with is “email@example.com”. Pair Networks uses mbox format, but as far as I know most of the following works equally well with maildir format.
First, let’s define our goals. I want to:
- Minimize spam in the inbox.
- Be able to review flagged spam for false positives.
- Be able to train the filter to improve its detection rate.
- Do everything on the server, not with an email client or otherwise on my local workstation. That way I have a consistent and spam-free inbox even when using a different or slow, low-bandwidth device (e.g. mobile phone) to check my mail.
Let’s also define our terms, so we are speaking the same language:
- Spam: Unsolicited bulk mail (UBE). That newsletter you don’t want any more is not spam, because it’s not unsolicited. Those forwarded hoaxes from your Aunt Hortensia are not spam, because she doesn’t send them in bulk. A good spam filter can catch most UBE, but don’t expect it to read your mind and know what non-UBE mail you happen to dislike.
- Ham: The opposite of spam; mail the spam filter should not flag.
- False negative: Spam that slipped through the filter unflagged, an annoyance but not a crisis.
- False positive: Ham incorrectly flagged as spam, potentially a much more serious problem than a false negative.
- Training: Providing feedback to SpamAssassin, providing it with examples of spam and ham, particularly false negatives and false positives.
We shall proceed thus:
- Set up spam filtering on the mail server (and optionally reject incoming mail from known spam sources)
- Route spam to its own folder (and optionally delete the most obvious stuff)
- Set up spam and ham training
- Automatically delete old spam
The following relates specifically to accounts with Pair Networks; most other hosts will have similar options and tools.
SET UP SPAM FILTERING
In Pair’s Account Control Center (ACC), open Email Management – Manage Junk E-Mail Filter Settings and enable graylisting and virus scanning. Set SpamAssassin filter sensitivity to the highest level. In “E-Mail Subject”, set the desired option (I prefer to leave it unmodified) and press “Commit Changes”.
Press “Junk E-Mail Filtering Options for Advanced Users”. Enable “Use SpamAssassin DNSBLs for junk e-mail filtering scoring” and “Use Bayesian Filter”. Optionally you may wish to enable “Use Spamhaus SBL/XBL to reject e-mail”: this will reduce your volume of received junk but at the risk of rejecting legitimate mail originating from known spam sources. Press “Commit Changes”.
In ACC – Email Management, find the list of your domains and open one of them (click on “This domain has X recipes and Y mailboxes”). Unless you have reason to do otherwise, insure that all mailboxes have the junk filter enabled. Forwarders and other recipes should usually not be filtered. Repeat this check for all domains.
ROUTE SPAM TO ITS OWN FOLDER
At this point all incoming mail will be filtered by SpamAssassin, which will add headers to each message indicating its spam score. We won’t see any difference in our inbox, however, until we tell the mail server where to route the spam. I prefer to route it to its own folder, INBOX.Spam. This keeps the inbox uncluttered, and keeps users happy by emulating well-known services like Gmail. If you do this, be sure to automate the removal of old spam (see reference) or users will go over their mail quotas.
There are at least two ways to sort incoming mail:
The easy way
Being lazy, I use this method unless there is some special reason to do otherwise.
In ACC – Email – Email Settings, find the list of your domains and open example.com (click on “This domain has X recipes and Y mailboxes”). Find the address you are interested in and click on the delivery instructions, which will take you to a new page where you can edit those instructions. Ensure that junk email filtering is enabled and specify /usr/boxes/account/example.com/user^/.imap/INBOX.Spam as the file to save junk to. Naturally you will want to change the path as needed. Press “Commit Changes”.
The flexible way
It requires more work to set up, but using procmail is more flexible. Besides simply moving spam to a separate folder, you can delete mail with especially high spam scores, sort non-spam into different folders, and more. I use this method for special needs that the easy way (above) cannot accommodate. See the procmail notes for further details.
It is not necessary to both route spam with procmail (described here) and to specify a file for junk mail storage in the mailbox (described in “the easy way”). If you use procmail for spam routing, then do not specify a file for junk mail storage.
Whether you use the easy way or the flexible way, at this point all incoming mail flagged as spam will land in the spam folder. As you will see, however, SpamAssassin does only a so-so job out of the box. To get the best results we have to train it.
SET UP SPAM AND HAM TRAINING
In ACC – Email Management, click on “Create new recipe”. For recipient address, select a single email address, firstname.lastname@example.org. For recipe type, select “Filter” and press “Proceed”. For email filter, enter:
/usr/local/bin/sa-learn --spam --mbox -D > /usr/home/account/sa-learn-log
Provide the full path to sa-learn, changing the path shown above as needed. Do not enable junk filtering and press “Create filter”.
The -D option instructs sa-learn to produce debugging output, and appending > /usr/home/account/sa-learn-log directs the output to a new file named sa-learn-log.
Wait 10 minutes and forward as an attachment an example of spam to email@example.com. Wait another 10 minutes for any failure messages to be returned to you. Then check the log at /usr/home/account/sa-learn-log (ssh into the server, then cat sa-learn-log). You should see a message such as “Messages queued for learning”. If all is well, delete the log file and edit the filter above to remove the portion -D > /usr/home/account/sa-learn-log.
Now create a similar recipe, substituting “ham” for “spam” and skipping the logging.
Henceforth all mail forwarded as an attachment to firstname.lastname@example.org will be learned as spam, and mail forwarded as an attachment to email@example.com will be learned as ham. When forwarding mail, take a moment to remove anything your email client might add, such as “Fwd:” in the subject header or a signature line. You don’t want SpamAssassin associating, say, your signature line with spam. Likewise insure you are forwarding the message as an attachment, not inline or quoted. This is done in different ways in different email clients; if you use Evolution, select Message – Forward As – Attached.
Do not expect immediate results. SpamAssassin needs at least 200 examples each of spam and ham before it can begin to put its training to use, and the more you train it, the better it gets. Once I start to get satisfactory results, I henceforth only bother to feed it false negatives and false positives. I reached that point with a week of training, but naturally your results will vary with the volume and nature of your mail.
The addresses spam-training@ and ham-training@ are given here as illustrations only. Use addresses of your own invention which should not become public knowledge. Imagine the result should a spammer learn and bombard your ham training address.
Update 14 February 2012: I am testing an alternate means of spam training, which if it works should be easier both to set up and for the user to use. Have the user move all false negatives into the spam folder and false positives into the inbox. Then these two daily cron job should train SpamAssassin on them:
/usr/local/bin/sa-learn --spam --mbox /usr/boxes/imc/copaninvest.com/info^/.imap/INBOX.Spam /usr/boxes/imc/copaninvest.com/warren^/.imap/INBOX.Spam /usr/local/bin/sa-learn --ham --mbox /usr/boxes/imc/copaninvest.com/info /usr/boxes/imc/copaninvest.com/warren
TODO: Update this section once I’ve decided if it works or not.
AUTOMATICALLY DELETE OLD SPAM
At this point SpamAssassin should be doing a great job, but your spam folder is growing by the hour and if left unchecked will eventually overrun your storage quota. Our last step is to automatically delete any messages in the spam folder that are more than a few days old. A reasonable cutoff date for me is seven days: that gives me enough time to check the spam folder and rescue any false positives that might have landed there, without being so long that my storage quota is affected.
The automatic deletion of messages is done with archivemail.