… otherwise known as my Spambayes experience, day 7.
In my quest to resolve my spam issues (as described in a previous entry), I decided to try out SpamBayes, a popular open source Bayesian filter implemented in Python. This seemed to fit the bill for me: it works as a POP3 Proxy, providing a non-invasive spam filtering solution for my existing mail client, Outlook Express (OE). It also seemed like a good exercise to give me some exposure to Python, a language and environment I’ve yet to spend much time exploring.
Spambayes works by sitting in-between OE and my email servers. As OE retreives my email, Spambayes sits in the middle, monitoring for potential spam. If any email is determined to be spam, it prefixes the subject of the message with ‘spam,’.
Installation was pretty straightforward. After installing the latest release of Python (including obligatory Windows reboot), Python Win32 Extensions (for Windows Service support), and SpamBayes itself, I was up and running in a matter of minutes. I had to make a couple of changes to my Outlook Express account configuration to talk to SpamBayes’ POP3 and SMTP proxies instead of my hosting accounts’ servers, and then I was ready to start canning the spam. I configured a mail rule within Outlook Express to redirect any message whose subject began with “spam,” to a Spam Mail folder. I then proceeded to check for new mail, and immediately messages began to collect within the spam folder.
So, on to the pros and cons: It took me about 5-10 minutes to get the software installed and up and running. For an Outlook Express user like myself, the installation is a little more involved than that for Outlook users, who have a prepackaged Windows installer. Still, that aside, the process wasn’t too complicated. Configuring Spambayes was straightforward, and didn’t require too much work to get it going.
I’d certainly like something that has tighter integration into my mail client. Given that I’m still using Outlook Express, I think I’m pretty constrained by its lack of support for plug-ins. If I do move to a different email client, I’ll certainly be looking for something that has better integration. Also, there seems to be some form of conflict when updating Spambayes’ configuration. This requires you to stop and restart Spambayes before you’re able to perform any classifications within its web interface. My other concern – and I guess this is because I’m still in an evaluation mode – I’d hate for the effort I put into training to be wasted. If there was some standardized way that I could extract the rule corpus – just like you can extract list of RSS feeds as an OPML blogroll from your aggregator – I’d feel a bit happier. (If anyone knows whether this is possible, please leave a comment below)
Anyway, it is still early days. I’ve been using the software for the past week, and, so far, it has been pretty promising. The above image is from the Spambayes administration interface, and gives you an idea of how active it has been in detecting spam. It still needs some more training, as there are some misdetections for both Spam and ‘Ham’, however fortunately I have yet to experience any false positives. The user experience is a little disconnected, but that’s really due to the lack of integration possibilities within Outlook Express. However, I hope that once the service has passed a certain level of training, I’ll just be able to leave it running in the background.
So, it looks like Spambayes will keep my spam under control for the short term. I’m still thinking about a longer-term switch to another client with built-in spam filtering capability. Thunderbird looks like it is gathering momentum, with an imminent release of version 0.3. An evaluation is definitely on my TODO list. Look out for a write-up of my Thunderbird escapades soon…