A Pythonesque Approach to Spam

| | Comments (2)

... otherwise known as my Spambayes experience, day 7.

spambayes.png

In my quest to resolve my spam issues (as described in a previous entry), I decided to try out SpamBayes, a popular open source Bayesian filter implemented in Python. This seemed to fit the bill for me: it works as a POP3 Proxy, providing a non-invasive spam filtering solution for my existing mail client, Outlook Express (OE). It also seemed like a good exercise to give me some exposure to Python, a language and environment I've yet to spend much time exploring.

Spambayes works by sitting in-between OE and my email servers. As OE retreives my email, Spambayes sits in the middle, monitoring for potential spam. If any email is determined to be spam, it prefixes the subject of the message with 'spam,'.

Installation was pretty straightforward. After installing the latest release of Python (including obligatory Windows reboot), Python Win32 Extensions (for Windows Service support), and SpamBayes itself, I was up and running in a matter of minutes. I had to make a couple of changes to my Outlook Express account configuration to talk to SpamBayes' POP3 and SMTP proxies instead of my hosting accounts' servers, and then I was ready to start canning the spam. I configured a mail rule within Outlook Express to redirect any message whose subject began with "spam," to a Spam Mail folder. I then proceeded to check for new mail, and immediately messages began to collect within the spam folder.

So, on to the pros and cons: It took me about 5-10 minutes to get the software installed and up and running. For an Outlook Express user like myself, the installation is a little more involved than that for Outlook users, who have a prepackaged Windows installer. Still, that aside, the process wasn't too complicated. Configuring Spambayes was straightforward, and didn't require too much work to get it going.

I'd certainly like something that has tighter integration into my mail client. Given that I'm still using Outlook Express, I think I'm pretty constrained by its lack of support for plug-ins. If I do move to a different email client, I'll certainly be looking for something that has better integration. Also, there seems to be some form of conflict when updating Spambayes' configuration. This requires you to stop and restart Spambayes before you're able to perform any classifications within its web interface. My other concern - and I guess this is because I'm still in an evaluation mode - I'd hate for the effort I put into training to be wasted. If there was some standardized way that I could extract the rule corpus - just like you can extract list of RSS feeds as an OPML blogroll from your aggregator - I'd feel a bit happier. (If anyone knows whether this is possible, please leave a comment below)

Anyway, it is still early days. I've been using the software for the past week, and, so far, it has been pretty promising. The above image is from the Spambayes administration interface, and gives you an idea of how active it has been in detecting spam. It still needs some more training, as there are some misdetections for both Spam and 'Ham', however fortunately I have yet to experience any false positives. The user experience is a little disconnected, but that's really due to the lack of integration possibilities within Outlook Express. However, I hope that once the service has passed a certain level of training, I'll just be able to leave it running in the background.

So, it looks like Spambayes will keep my spam under control for the short term. I'm still thinking about a longer-term switch to another client with built-in spam filtering capability. Thunderbird looks like it is gathering momentum, with an imminent release of version 0.3. An evaluation is definitely on my TODO list. Look out for a write-up of my Thunderbird escapades soon...

2 Comments

André Raymond said:

Regarding your attempt to minimize spam using
Spambayes. Over the last few months I came across
a free and xml driven spam eliminator, which offers plug-ins to meet all possible requirements.
You can even build your own mail plug-in. Works with Outlook Express, and others. More info.
http://www.spamihilator.com/index2.php?lang=en
from Germany. A software that learns.
Thought you might like to know.

Tania said:

I tried several different anti-spam tools for Outlook Express (including Spambayes) and Spam Bully (http://www.spambully.com) was by far the easiest and the most comprehensive of the bunch. It integrates perfectly with Outlook Express, moreover it hooks Outlook Express rules, making it more flexible.
I was getting well over 100 spam emails per day until I downloaded this product. By the time I had tested the free version for a few weeks my inbox had become completely free of any SPAM (there were some "false positives"), but the program soon learned and quickly improved.
Spam bully has a lot of great features such as Friends/Spammers lists, email blocking by country (it has a cool map), blocking language, Allow/Block words and phrases, shows the statistics how effective the filter is.
Also it allows you to see detailed information about each email you receive- IP address, country, character set, and how SpamBully ranked it. Tells you why a message was or was not blocked and how to correct this in the future

About this Entry

This page contains a single entry by published on October 12, 2003 8:58 PM.

Returned mail: User unknown was the previous entry in this blog.

Java Performance Analysis Tools is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.