[R] Off topic:Spam on R-help increase?

Marc Schwartz marc_schwartz at comcast.net
Sat Mar 10 17:20:38 CET 2007


On Sat, 2007-03-10 at 10:17 -0500, François Pinard wrote:
> [Marc Schwartz]
> 
> >The "Human Spam Filter" (aka Martin) [...]
> 
> The R mailing list has, indeed, be remarkably spam-free, and 
> well-managed so far that I can see.  I do hope, however, that Martin 
> does not have to do the filtering himself -- it would be just daunting!
> 
> In any case, Martin, a lot of thanks from me!

The comment was somewhat "tongue-in-cheek".

While a major proportion of spam can be filtered using automated tools,
it takes a significant amount of manual effort to configure the tools to
achieve the level of cleansing that we observe here.

On my system (laptop running FC6 Linux), I am using SpamAssassin with
Bayesian filtering enabled, along with remote spam checks such as DCC,
Razor, Pyzor and some RBLs. 

I also recently started using FuzzyOCR (as a plug-in to SA) to enhance
the filtering of spam containing only graphic content. These e-mails are
of course specifically designed to obviate the utility of text based
spam filtering.

However, I still get some that come through despite the above. There are
also 'borderline' e-mails that require manually running the spam/ham
learning scripts.

To increase the filtering effectiveness to the level we see here, I
would have to spend a fair amount of time writing custom rules for SA
and this is where I have no doubt, Martin spends a lot of his time with
list management.

HTH,

Marc Schwartz



More information about the R-help mailing list