[RsR] Robust regression

Thu Dec 22 12:08:55 CET 2005

>>>>> "Matias" == Matias Salibian-Barrera <matias using stat.ubc.ca>
>>>>>     on Wed, 21 Dec 2005 14:07:55 -0800 writes:

    Matias> Dear List,

    Matias> A few days ago I uploaded the package "roblm" to CRAN. It implements 
    Matias> MM-regression estimators and currently has some diagnostic plots as 
    Matias> well. The documentation needs quite a bit of work, but the main 
    Matias> information is there.

Thank you, Matias!

    Matias> Note that the name of the package (and the corresponding class) is 
    Matias> "roblm", but this does not mean that I necessarily prefer this name over 
    Matias> others. I've been working on this for some time now, and for the reasons 
    Matias> that I mentioned in Treviso ("rlm" and "lmRob" are already taken) I 
    Matias> settled for roblm.

(I'm replying to this in a different e-mail on  
 "function naming ... " )

    Matias> I took the liberty to re-organize the regression
    Matias> workgroup minutes relating to linear regression in a
    Matias> format closer to a "to-do" list. I would very much
    Matias> appreciate your feedback. Names within parenthesis
    Matias> indicate "volunteers" for a particular task. Please
    Matias> feel free to correct all my mistakes (and to add
    Matias> your name to the list of volunteers!)

    Matias> -- An initial (partial) list of things that remain
    Matias>    to be done for robust linear regression

>>     - expose the "initial estimator" as a separate function.
>> 	    This should include the fast-S (already built-in), the
>> 	    alternate M-S estimator for the case of many factor
>> 	    variables (Victor can provide code), and the heuristic
>> 	    initial estimator of Yohai-Pena (Victor has code?)

>>     - explore how to use score ("psi") functions declared in
>> 	    R (as S4 objects) in the C code

good point (which you've raised on this list earlier, but didn't
get an answer)!
I'm not yet sure what the optimal approach would be here.
There is some overhead calling the interpreted R code from the C
code.  In the mean time I had an idea which might be more
flexible: The R class could have a slot which is (empty or) a
pointer ("externalptr") to a C function if that is available.

Hence for some score/psi/rho/.. functions one, R would call fast
builtin C-code where one still would have the full flexibility
of "playing with" new *-functions..

>>     - decide which options should stay in the "control" function
>>     - add to summary.roblm(), print.roblm() and
>> 	    print.summary.roblm() information on which estimation
>> 	    method was used (MM, etc)
>>     - write a model selection function (Victor has code for a robust
>> 	    backward stepwise method (RFPE?); Elvezio may know of / have
>> 	    code for the robust Cp; Eva can provide this part?)
>>     - write an "anova" function using robust F-, Wald tests	(Matias)

yes, in my view; I had argued in Treviso that 'anova' shoold be
used as function for comparing nested models also in situations
where it's really not an '[an]alysis [o]f [va]riance' anymore.

>>     - add the robust weights to the returned object (Matias)

I agree.  This applies to almost all robust procedures.
Here we should also try to find a common name
('wts','weights',..), and also 'standardize' I think; either
requiring 
max_i w_i = 1 (natural for  w(r_i) = psi(r_i) / r_i)  or then  
sum_i w_i = 1 (natural for  formulae using the weights)

>>     - improve / complete roblm documentation (Matias)

>>     - incorporate more data sets (with documentation) (Matias)

I've also asked for this a few days ago, and Valentin has
promised to provide quite a few from the Rousseeuw-Leroy book.
Hopefully, those from the MMY book (Maronna-Martin-Yohai) will
also be provided relatively soon.

Martin