[R] R-help "spam" detection; please help the moderators

(Ted Harding) Ted.Harding at manchester.ac.uk
Tue Jun 1 19:22:10 CEST 2010


Hi Joris,
The "matched a filter rule" is the principal reason for holding
messages for moderation. Please don't become anxious about the
situation -- one of the reasons we have become concerned about
the situation is that people whose messages get held up do tend
to become worried about it. This is unnecessary -- they, and you,
are not really doing anything wrong!

As I understand it, the filter rules are set by the ethz.ch admins,
and even Martin does not seem to know, in detail, what they are.
Also, it is likely that the filters "learn", and they may well
be "learning" from a lot of other emails received at eth.ch which
have nothing to do with R-help but which are true spam -- then the
headers of such messages could be folded into "Bayes spam scores"
which can trigger "matched a filter rule".

As far as R-help is concerned, the situation seems to be that
gmail.com and nabble.com are important triggers (though there
are plenty of others). Lots of email addresses which do not have
the username ending in digits are trapped in this way.

In round figures, proportions I have logged myself amongst the
messages held because they "matched a filter rule" are:

non-gmail, non-nabble: 30%
non-gmail, nabble    : 25%
    gmail, non-nabble: 32%
    gmail, nabble    : 13%

gmail : 45%
nabble: 52%

The true nature of this situation is still unclear!

Ted.


On 01-Jun-10 16:00:40, Joris Meys wrote:
> Hi all,
> 
> I also couldn't help but notice that some of my messages are bounced
> for following reason:
> 
>    The message headers matched a filter rule
> 
> I included the header of one of the messages below, but neither of
> these messages is sent trough Nabble, nor does any mail address has
> digits in it.
> I also never had that before. Did you change some of the rules somehow?
> 
> Cheers
> Joris
> 
> -----------------------
> 
> MIME-Version: 1.0
> Received: by 10.140.173.9 with HTTP; Fri, 28 May 2010 05:32:32 -0700
> (PDT)
> In-Reply-To:
> <AANLkTim9eTuY2EfynLoH2LYN7M133YTjeNcDJpkGPHJx at mail.gmail.com>
> References:
> <AANLkTikgC7V2ZbSYRWcWBUeeZm8D24qj0VqeB2z1NduD at mail.gmail.com>
>       <AANLkTim9eTuY2EfynLoH2LYN7M133YTjeNcDJpkGPHJx at mail.gmail.com>
> Date: Fri, 28 May 2010 14:32:32 +0200
> Delivered-To: jorismeys at gmail.com
> Message-ID:
> <AANLkTimg4IDyiVhe1ek9mk6_RybjcNuU4msvWRvtSGTS at mail.gmail.com>
> Subject: Re: [R] How to get values out of a string using regular
> expressions?
> From: Joris Meys <jorismeys at gmail.com>
> To: Gabor Grothendieck <ggrothendieck at gmail.com>
> Cc: R mailing list <r-help at r-project.org>
> Content-Type: multipart/alternative;
> boundary=000e0cd2295481515c0487a6b3be
> 
> --000e0cd2295481515c0487a6b3be
> Content-Type: text/plain; charset=ISO-8859-1
> 
> 
> 
> On Tue, Jun 1, 2010 at 3:25 PM, Martin Maechler
> <maechler at stat.math.ethz.ch>wrote:
> 
>> Dear readers of R-help
>>
>> as most of you will *not* be aware, R-help has continued to work the
>> way it does, only thanks to a dozen of volunteers,
>> see https://stat.ethz.ch/mailman/listinfo/r-help .
>>
>> The volunteers manually moderate e-mails that "look like spam" (and
>> sometimes are and sometimes are not).
>> While much more than 90% of the spam is filtered out long before
>> a human sees it, with the increasing sophistication of spammers,
>> manual intervention has deemed to be necessary and served the
>> community very well.
>>
>> OTOH, in recent weeks, the amount of work for the volunteers has
>> increased, mainly because an increasingly number of non-spam postings
>> are
>> erronously tagged as "possibly spam".
>> We have discussed about this and done some analysis and found
>> that most of these message that produce a considerable amount of
>> extra work share two properties :
>>  1) they are posted via Nabble  {which *always* attaches a small
>>                                 pro-Nabble spam at the end of the
>>                                 message}
>>  2) the e-mail address of the sender is from a freemail
>>    provider, quite often 'at gmail dot com', and often the part
>>    *before* the '@' (at-sign) ends with digits.
>>
>> We hereby ask those among you who use a freemail account to
>> please no longer post via nabble.
>>
>> Thank you for your support of R-help, *the* "community mailing
>> list" of the R project since even before that project existed
>> "formally", namely since 1997-04-01,
>> today 13 years and two months.
>>
>> Martin Maechler, ETH Zurich
>> (and R-help creator and principal manager)
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> 
> 
> -- 
> Joris Meys
> Statistical Consultant
> 
> Ghent University
> Faculty of Bioscience Engineering
> Department of Applied mathematics, biometrics and process control
> 
> Coupure Links 653
> B-9000 Gent
> 
> tel : +32 9 264 59 87
> Joris.Meys at Ugent.be
> -------------------------------
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
> 
>       [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 01-Jun-10                                       Time: 18:22:08
------------------------------ XFMail ------------------------------



More information about the R-help mailing list