[RsR] Questions about interpreting lmRob output

Tue Nov 13 21:59:21 CET 2007

Dear Robust R gurus,
There are a variety of functions I can use to perform robust regression
in R and I am trying to figure out what would be the best package in R
to recommend to do robust regression. From reading through the archives
and looking at Maronna, Martin and Yohai’s book, my sense is that most
people on this list would recommend the robust library. That’s the one I
like the best too, because its output parallels the lm functions.
However, I have a few questions.

First, several books that discuss R and have mentioned robust methods
(Crawley, 2007; Faraway, 2005; Fox, 2002; Jureckova & Picek, 2006) have
used the rlm function in the MASS library. Do you have any idea why
these books would recommend rlm over lmRob? I don’t find the output to
be as helpful as lmRob’s output. Well, this is just curiosity.

Second, I have some questions about understanding the output of lmRob:
1) How can I interpret the test for bias in the summary output? 
 Test for Bias:
                   statistic   p-value
M-estimate  0.5652855 0.9997877
LS-estimate 3.6125604 0.8902805

Does the fact that the p-values are above .05 mean there was no bias in
my original data set?

2) I was excited to see that the robust library had functions like
update.lmRob and step.lmRob which again paralleled what I was doing with
the lm function. However, when I tried to use update.lmRob, here is the
error message I got (data set reproduced below):
m1.robust=lmRob(G1L1WR~ PAL2*KL2WR*NS, data=lafrance.na)
update.lmRob(m1.robust,~.-PAL2:KL2WR:NS)
Error in NextMethod() : generic function not specified

And whenever I use step.lmRob, I get back exactly the same model that I
put into it, so that doesn’t seem to be working either!

> step.lmRob(m1.robust)
Start:  RFPE= 25.0857 
 G1L1WR ~ PAL2 * KL2WR * NS 

Warning in lmRob.fit.compute(x2, y, x1 = x1, x1.idx = x1.idx, nrep =
nrep,  :
  Max iteration for final coefficients reached.

Single term deletions

Model:
G1L1WR ~ PAL2 * KL2WR * NS

scale:  0.2330676 

              Df   RFPE
<none>           25.086
PAL2:KL2WR:NS  1 26.658
Call:
lmRob(formula = G1L1WR ~ PAL2 * KL2WR * NS, data = lafrance.na)

Coefficients:
  (Intercept)          PAL2         KL2WR            NS    PAL2:KL2WR   
   PAL2:NS 
    -4.055888      4.205368     -8.077648      1.483424      4.342931   
 -1.264462 
     KL2WR:NS PAL2:KL2WR:NS 
     4.545588     -2.474043 

Degrees of freedom: 37 total; 29 residual
Residual standard error: 0.2330676 

3) I had better luck with the plots.lmRob( ) function but I was
wondering what plots 9 and 10 were. What does it mean to have ‘overlaid’
plots? I could see an overlaid Q-Q plot of residuals was different than
the Normal Q-Q plot of residuals, but I don’t understand what the
difference is.

4) As I understand the documentation, the lmRob function will use
M-estimators or S-estimators and will search for the best one for the
data. Reading over Wilcox (2005), he notes that no one robust method
will work best all of the time (as do Maronna, Martin and Yohai), but he
seems to like the Theil-Sen estimator, M-estimators, the MGV estimator,
the STS estimator, least trimmed squares the least absolute value
estimator, and the deepest regression line method. Not being well-versed
in the theory, I am going to venture the guess that the estimators that
lmRob is using would cover the M-estimators and STS estimator, but none
of the others. Does anyone have a comment on why only M-estimators and
S-estimators are used in lmRob? Along the same lines, Wilcox (2005)
recommends running a robust regression with an estimator with a high
breakdown point and a lower (.2-.3) breakdown point and comparing
estimates and CIs. From what I can see, M-estimators have a high
breakdown point (.5). I don’t know about S-estimators, but does anyone
else see value in what Wilcox recommends, and have an idea for some
other parameter I should use with lmRob to run a robust regression using
an estimator with a lower breakdown point?6) OK, and last of all, a dumb question. Is this dumb? Does lmRob
bootstrap the data? From the description of sampling it seems like it
does, but I am not sure exactly from the terminology.

sessionInfo()
R version 2.6.0 (2007-10-03) 
i386-pc-mingw32 
locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
attached base packages:
[1] tcltk     stats     graphics  grDevices utils     datasets  methods 
 base     
other attached packages:
[1] robustbase_0.2-8 Rcmdr_1.3-5      car_1.2-5        robust_0.3-0    
lattice_0.17-2  
[6] MASS_7.2-36     

loaded via a namespace (and not attached):
[1] grid_2.6.0  tools_2.6.0

> lafrance.na
   NR WM       NS     PAL2      PAL1    G1L1WR     KL2WR    G1L2WR    
KL1WR
1  47  2 2.406489 1.653213 1.5910646 1.7634280 1.1139434 1.7075702
1.4913617
2  45  1 2.568917 1.041393 1.2552725 0.7781513 0.0000000 0.0000000
0.3010300
4  48  2 2.177161 1.491362 1.3979400 1.5051500 0.0000000 0.3010300
0.0000000
5  46  4 2.010342 1.633468 1.3979400 1.5051500 0.0000000 0.3010300
0.7781513
6  52  4 2.201861 1.591065 1.5314789 1.6901961 0.9500000 1.1461280
0.9030900
7  59  2 2.247556 1.447158 1.1461280 0.4771213 0.0000000 0.0000000
0.0000000
8  44  2 2.319169 1.230449 0.7781513 0.6989700 0.0000000 0.0000000
0.0000000
9  31  2 2.443310 1.176091 1.4471580 0.8450980 0.0000000 0.0000000
0.0000000
10 56  3 2.119850 1.623249 1.6020600 1.7160033 1.2041200 1.5314789
1.2787536
11 45  0 2.451495 1.414973 1.0791812 1.1461280 0.0000000 0.0000000
0.3010300
12 43  4 2.239650 1.544068 1.4771213 1.4149733 0.9542425 1.1139434
1.1139434
13 42  4 2.279325 1.176091 0.0000000 0.4771213 0.0000000 0.0000000
0.3010300
14 42  4 2.123067 1.414973 1.1139434 1.7853298 0.0000000 1.3802112
0.6989700
15 44  3 2.377215 1.230449 0.9542425 1.6901961 0.0000000 0.7781513
0.0000000
16 46  4 2.161937 1.633468 1.6334685 1.6720979 1.2041200 1.2041200
1.4471580
17 52  2 2.316159 1.643453 1.5910646 1.7481880 1.0000000 1.1139434
1.3010300
19 56  2 2.200440 1.518514 1.2787536 1.4471580 0.6989700 0.6989700
0.6989700
21 46  1 2.404166 1.113943 1.1760913 1.1139434 0.0000000 0.9030900
0.0000000
22 49  0 2.446211 1.041393 0.7781513 0.0000000 0.0000000 0.0000000
0.0000000
23 44  0 2.264180 1.204120 0.9030900 0.9030900 0.0000000 0.0000000
0.0000000
24 48  3 2.354953 1.531479 1.3424227 1.2552725 0.0000000 0.0000000
0.0000000
25 54  4 2.110118 1.698970 1.5563025 1.5910646 1.4313638 1.0791812
1.0000000
26 62  3 2.292012 1.462398 0.9542425 1.7403627 0.6020600 0.9030900
0.9030900
27 52  3 2.163728 1.633468 1.3802112 1.5185139 0.0000000 0.0000000
0.0000000
28 55  3 2.209944 1.591065 1.4313638 1.5185139 1.0413927 0.7781513
0.0000000
29 44  3 2.258014 1.681241 1.6812412 1.6989700 1.2787536 1.2787536
1.3617278
30 50  3 2.163638 1.755875 1.6232493 1.5185139 1.3617278 1.3010300
1.0413927
31 58  3 2.154059 1.518514 1.4149733 1.4149733 0.0000000 0.3010300
0.4771213
32 36  3 2.317353 1.447158 1.1760913 0.7781513 0.4771213 0.7781513
0.6020600
33 43  2 2.147151 1.568202 1.4149733 1.5314789 0.0000000 0.0000000
0.6020600
34 40  2 2.273395 1.477121 1.4623980 1.4149733 0.3010300 0.7781513
0.4771213
35 55  3 2.106837 1.518514 1.5440680 1.6901961 1.0000000 1.5440680
0.6989700
36 63  4 2.070555 1.799341 1.3979400 1.7923917 0.7781513 1.6434527
0.8450980
37 49  3 2.029506 1.612784 1.6627578 1.8195439 1.3617278 1.8864907
1.1139434
38 61  6 2.143296 1.832509 1.7853298 1.8260748 2.0253059 1.9822712
1.7403627
39 58  2 2.217036 1.653213 1.5314789 1.3979400 0.9542425 0.9030900
0.9030900
40 46  2 2.160889 1.623249 1.5910646 1.6901961 0.4771213 1.3222193
0.0000000

Dr. Jenifer Larson-Hall
Assistant Professor of Linguistics
University of North Texas
(940)369-8950