[RsR] Questions about interpreting lmRob output

Wed Nov 14 12:03:19 CET 2007

Here are some quick answers to your questions.  Please let me know if  
you would like more details.

Kjell

On 13 Nov 2007, at 20:59, Jenifer Larson-Hall wrote:

> Dear Robust R gurus,
> There are a variety of functions I can use to perform robust  
> regression
> in R and I am trying to figure out what would be the best package in R
> to recommend to do robust regression. From reading through the  
> archives
> and looking at Maronna, Martin and Yohai’s book, my sense is that most
> people on this list would recommend the robust library. That’s the  
> one I
> like the best too, because its output parallels the lm functions.
> However, I have a few questions.
>
> First, several books that discuss R and have mentioned robust methods
> (Crawley, 2007; Faraway, 2005; Fox, 2002; Jureckova & Picek, 2006)  
> have
> used the rlm function in the MASS library. Do you have any idea why
> these books would recommend rlm over lmRob? I don’t find the output to
> be as helpful as lmRob’s output. Well, this is just curiosity.

The Robust Library was originally developed for S-PLUS and has only  
recently (sometime in the last 1-2 years) been made available for R.

> Second, I have some questions about understanding the output of lmRob:
> 1) How can I interpret the test for bias in the summary output?
> Test for Bias:
>                   statistic   p-value
> M-estimate  0.5652855 0.9997877
> LS-estimate 3.6125604 0.8902805
>
> Does the fact that the p-values are above .05 mean there was no bias  
> in
> my original data set?

The test for bias is a diagnostic for the fitting procedure and is not  
directly applicable to the fitted model.  It is computed by the  
function test.lmRob - if you're curious take a look at the reference  
in the help file.

> 2) I was excited to see that the robust library had functions like
> update.lmRob and step.lmRob which again paralleled what I was doing  
> with
> the lm function. However, when I tried to use update.lmRob, here is  
> the
> error message I got (data set reproduced below):
> m1.robust=lmRob(G1L1WR~ PAL2*KL2WR*NS, data=lafrance.na)
> update.lmRob(m1.robust,~.-PAL2:KL2WR:NS)
> Error in NextMethod() : generic function not specified

S-PLUS and R handle generic functions slightly differently.  I thought  
I had this working but it looks like I missed something.  Sorry.

> And whenever I use step.lmRob, I get back exactly the same model  
> that I
> put into it, so that doesn’t seem to be working either!
>
>> step.lmRob(m1.robust)
> Start:  RFPE= 25.0857
> G1L1WR ~ PAL2 * KL2WR * NS
>
> Warning in lmRob.fit.compute(x2, y, x1 = x1, x1.idx = x1.idx, nrep =
> nrep,  :
>  Max iteration for final coefficients reached.
>
> Single term deletions
>
> Model:
> G1L1WR ~ PAL2 * KL2WR * NS
>
> scale:  0.2330676
>
>              Df   RFPE
> <none>           25.086
> PAL2:KL2WR:NS  1 26.658
> Call:
> lmRob(formula = G1L1WR ~ PAL2 * KL2WR * NS, data = lafrance.na)
>
> Coefficients:
>  (Intercept)          PAL2         KL2WR            NS    PAL2:KL2WR
>   PAL2:NS
>    -4.055888      4.205368     -8.077648      1.483424      4.342931
> -1.264462
>     KL2WR:NS PAL2:KL2WR:NS
>     4.545588     -2.474043
>
> Degrees of freedom: 37 total; 29 residual
> Residual standard error: 0.2330676

For some reason lmRob is having trouble fitting the model G1L1WR ~  
PAL2 * KL2WR * NS - PAL2:KL2WR:NS.  This usually happens because more  
than half of the data fits the model perfectly (although I'm not sure  
that's what's happening in this case).  When the fitting fails  
step.lmRob and drop1.lmRob return the most recently selected model.   
Since this example fails in the first iteration it just returns the  
input model.

> 3) I had better luck with the plots.lmRob( ) function but I was
> wondering what plots 9 and 10 were. What does it mean to have  
> ‘overlaid’
> plots? I could see an overlaid Q-Q plot of residuals was different  
> than
> the Normal Q-Q plot of residuals, but I don’t understand what the
> difference is.

The plots are designed to compare robust and least squares fits of the  
same model.  Try this:

 > m1.comp <- fit.models(list(Robust = "lmRob", "Least Squares" =  
"lm"), formula = G1L1WR ~ PAL2*KL2WR*NS, data = lafrance.na)
 > plot(m1.comp)

then choices 9 and 10 make more sense.

> 4) As I understand the documentation, the lmRob function will use
> M-estimators or S-estimators and will search for the best one for the
> data. Reading over Wilcox (2005), he notes that no one robust method
> will work best all of the time (as do Maronna, Martin and Yohai),  
> but he
> seems to like the Theil-Sen estimator, M-estimators, the MGV  
> estimator,
> the STS estimator, least trimmed squares the least absolute value
> estimator, and the deepest regression line method. Not being well- 
> versed
> in the theory, I am going to venture the guess that the estimators  
> that
> lmRob is using would cover the M-estimators and STS estimator, but  
> none
> of the others. Does anyone have a comment on why only M-estimators and
> S-estimators are used in lmRob? Along the same lines, Wilcox (2005)
> recommends running a robust regression with an estimator with a high
> breakdown point and a lower (.2-.3) breakdown point and comparing
> estimates and CIs. From what I can see, M-estimators have a high
> breakdown point (.5). I don’t know about S-estimators, but does anyone
> else see value in what Wilcox recommends, and have an idea for some
> other parameter I should use with lmRob to run a robust regression  
> using
> an estimator with a lower breakdown point?6) OK, and last of all, a  
> dumb question. Is this dumb? Does lmRob
> bootstrap the data? From the description of sampling it seems like it
> does, but I am not sure exactly from the terminology.

The goal of the robust library is ease of use so, by default, it tries  
to abstract the specific choice of estimator as much as possible.   
lmRob tries to choose a combination of estimators that is sensible  
(although probably not the best) for the given model.  You may also  
want to look at the R package robustbase which takes a more technical  
approach to robust fitting.

> sessionInfo()
> R version 2.6.0 (2007-10-03)
> i386-pc-mingw32
> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> States.1252;LC_MONETARY=English_United
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
> attached base packages:
> [1] tcltk     stats     graphics  grDevices utils     datasets   
> methods
> base
> other attached packages:
> [1] robustbase_0.2-8 Rcmdr_1.3-5      car_1.2-5        robust_0.3-0
> lattice_0.17-2
> [6] MASS_7.2-36
>
> loaded via a namespace (and not attached):
> [1] grid_2.6.0  tools_2.6.0
>
>> lafrance.na
>   NR WM       NS     PAL2      PAL1    G1L1WR     KL2WR    G1L2WR
> KL1WR
> 1  47  2 2.406489 1.653213 1.5910646 1.7634280 1.1139434 1.7075702
> 1.4913617
> 2  45  1 2.568917 1.041393 1.2552725 0.7781513 0.0000000 0.0000000
> 0.3010300
> 4  48  2 2.177161 1.491362 1.3979400 1.5051500 0.0000000 0.3010300
> 0.0000000
> 5  46  4 2.010342 1.633468 1.3979400 1.5051500 0.0000000 0.3010300
> 0.7781513
> 6  52  4 2.201861 1.591065 1.5314789 1.6901961 0.9500000 1.1461280
> 0.9030900
> 7  59  2 2.247556 1.447158 1.1461280 0.4771213 0.0000000 0.0000000
> 0.0000000
> 8  44  2 2.319169 1.230449 0.7781513 0.6989700 0.0000000 0.0000000
> 0.0000000
> 9  31  2 2.443310 1.176091 1.4471580 0.8450980 0.0000000 0.0000000
> 0.0000000
> 10 56  3 2.119850 1.623249 1.6020600 1.7160033 1.2041200 1.5314789
> 1.2787536
> 11 45  0 2.451495 1.414973 1.0791812 1.1461280 0.0000000 0.0000000
> 0.3010300
> 12 43  4 2.239650 1.544068 1.4771213 1.4149733 0.9542425 1.1139434
> 1.1139434
> 13 42  4 2.279325 1.176091 0.0000000 0.4771213 0.0000000 0.0000000
> 0.3010300
> 14 42  4 2.123067 1.414973 1.1139434 1.7853298 0.0000000 1.3802112
> 0.6989700
> 15 44  3 2.377215 1.230449 0.9542425 1.6901961 0.0000000 0.7781513
> 0.0000000
> 16 46  4 2.161937 1.633468 1.6334685 1.6720979 1.2041200 1.2041200
> 1.4471580
> 17 52  2 2.316159 1.643453 1.5910646 1.7481880 1.0000000 1.1139434
> 1.3010300
> 19 56  2 2.200440 1.518514 1.2787536 1.4471580 0.6989700 0.6989700
> 0.6989700
> 21 46  1 2.404166 1.113943 1.1760913 1.1139434 0.0000000 0.9030900
> 0.0000000
> 22 49  0 2.446211 1.041393 0.7781513 0.0000000 0.0000000 0.0000000
> 0.0000000
> 23 44  0 2.264180 1.204120 0.9030900 0.9030900 0.0000000 0.0000000
> 0.0000000
> 24 48  3 2.354953 1.531479 1.3424227 1.2552725 0.0000000 0.0000000
> 0.0000000
> 25 54  4 2.110118 1.698970 1.5563025 1.5910646 1.4313638 1.0791812
> 1.0000000
> 26 62  3 2.292012 1.462398 0.9542425 1.7403627 0.6020600 0.9030900
> 0.9030900
> 27 52  3 2.163728 1.633468 1.3802112 1.5185139 0.0000000 0.0000000
> 0.0000000
> 28 55  3 2.209944 1.591065 1.4313638 1.5185139 1.0413927 0.7781513
> 0.0000000
> 29 44  3 2.258014 1.681241 1.6812412 1.6989700 1.2787536 1.2787536
> 1.3617278
> 30 50  3 2.163638 1.755875 1.6232493 1.5185139 1.3617278 1.3010300
> 1.0413927
> 31 58  3 2.154059 1.518514 1.4149733 1.4149733 0.0000000 0.3010300
> 0.4771213
> 32 36  3 2.317353 1.447158 1.1760913 0.7781513 0.4771213 0.7781513
> 0.6020600
> 33 43  2 2.147151 1.568202 1.4149733 1.5314789 0.0000000 0.0000000
> 0.6020600
> 34 40  2 2.273395 1.477121 1.4623980 1.4149733 0.3010300 0.7781513
> 0.4771213
> 35 55  3 2.106837 1.518514 1.5440680 1.6901961 1.0000000 1.5440680
> 0.6989700
> 36 63  4 2.070555 1.799341 1.3979400 1.7923917 0.7781513 1.6434527
> 0.8450980
> 37 49  3 2.029506 1.612784 1.6627578 1.8195439 1.3617278 1.8864907
> 1.1139434
> 38 61  6 2.143296 1.832509 1.7853298 1.8260748 2.0253059 1.9822712
> 1.7403627
> 39 58  2 2.217036 1.653213 1.5314789 1.3979400 0.9542425 0.9030900
> 0.9030900
> 40 46  2 2.160889 1.623249 1.5910646 1.6901961 0.4771213 1.3222193
> 0.0000000
>
>
>
> Dr. Jenifer Larson-Hall
> Assistant Professor of Linguistics
> University of North Texas
> (940)369-8950
>
> _______________________________________________
> R-SIG-Robust using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-robust
>