[RsR] Robust regression
Martin Maechler
m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Thu Dec 22 12:08:55 CET 2005
>>>>> "Matias" == Matias Salibian-Barrera <matias using stat.ubc.ca>
>>>>> on Wed, 21 Dec 2005 14:07:55 -0800 writes:
Matias> Dear List,
Matias> A few days ago I uploaded the package "roblm" to CRAN. It implements
Matias> MM-regression estimators and currently has some diagnostic plots as
Matias> well. The documentation needs quite a bit of work, but the main
Matias> information is there.
Thank you, Matias!
Matias> Note that the name of the package (and the corresponding class) is
Matias> "roblm", but this does not mean that I necessarily prefer this name over
Matias> others. I've been working on this for some time now, and for the reasons
Matias> that I mentioned in Treviso ("rlm" and "lmRob" are already taken) I
Matias> settled for roblm.
(I'm replying to this in a different e-mail on
"function naming ... " )
Matias> I took the liberty to re-organize the regression
Matias> workgroup minutes relating to linear regression in a
Matias> format closer to a "to-do" list. I would very much
Matias> appreciate your feedback. Names within parenthesis
Matias> indicate "volunteers" for a particular task. Please
Matias> feel free to correct all my mistakes (and to add
Matias> your name to the list of volunteers!)
Matias> -- An initial (partial) list of things that remain
Matias> to be done for robust linear regression
>> - expose the "initial estimator" as a separate function.
>> This should include the fast-S (already built-in), the
>> alternate M-S estimator for the case of many factor
>> variables (Victor can provide code), and the heuristic
>> initial estimator of Yohai-Pena (Victor has code?)
>> - explore how to use score ("psi") functions declared in
>> R (as S4 objects) in the C code
good point (which you've raised on this list earlier, but didn't
get an answer)!
I'm not yet sure what the optimal approach would be here.
There is some overhead calling the interpreted R code from the C
code. In the mean time I had an idea which might be more
flexible: The R class could have a slot which is (empty or) a
pointer ("externalptr") to a C function if that is available.
Hence for some score/psi/rho/.. functions one, R would call fast
builtin C-code where one still would have the full flexibility
of "playing with" new *-functions..
>> - decide which options should stay in the "control" function
>> - add to summary.roblm(), print.roblm() and
>> print.summary.roblm() information on which estimation
>> method was used (MM, etc)
>> - write a model selection function (Victor has code for a robust
>> backward stepwise method (RFPE?); Elvezio may know of / have
>> code for the robust Cp; Eva can provide this part?)
>> - write an "anova" function using robust F-, Wald tests (Matias)
yes, in my view; I had argued in Treviso that 'anova' shoold be
used as function for comparing nested models also in situations
where it's really not an '[an]alysis [o]f [va]riance' anymore.
>> - add the robust weights to the returned object (Matias)
I agree. This applies to almost all robust procedures.
Here we should also try to find a common name
('wts','weights',..), and also 'standardize' I think; either
requiring
max_i w_i = 1 (natural for w(r_i) = psi(r_i) / r_i) or then
sum_i w_i = 1 (natural for formulae using the weights)
>> - improve / complete roblm documentation (Matias)
>> - incorporate more data sets (with documentation) (Matias)
I've also asked for this a few days ago, and Valentin has
promised to provide quite a few from the Rousseeuw-Leroy book.
Hopefully, those from the MMY book (Maronna-Martin-Yohai) will
also be provided relatively soon.
Martin
More information about the R-SIG-Robust
mailing list