[R] inconsistent behaviour of add1 and drop1 with a weighted linear model

Jenny Hodgson jh69 at york.ac.uk
Tue Mar 13 16:43:26 CET 2007


Dear R Help,
I have noticed some inconsistent behaviour of add1 and drop1 with a 
weighted linear model, which affects the interpretation of the results.
I have these data to fit with a linear model, I want to weight them by 
the relative size of the geographical areas they represent.
_________________________________________________________________________________________
 > example
           y        x1       x2   weights
1  -4.546477 0.1859556 50.00000 0.9466022
2   1.484246 0.4740497 29.88000 1.3252430
3   2.203681 0.8594264 16.93333 0.9466022
4   1.943163 0.8713360 42.11765 2.7766997
5   1.886473 0.9006082 19.00000 0.9466022
6   1.795393 0.8455183 23.68421 1.1674760
7   1.878077 0.5641396 35.00000 0.8203885
8  -4.215484 0.4356858 58.75000 0.4417477
9   1.993339 0.5440061 19.28571 0.8519420
10  1.560869 0.6285066 19.54545 0.8203885
11  2.761535 0.7017720 15.83333 0.1262136
12  0.995959 0.4638751  0.00000 0.9466022
13 -4.516514 0.2794199 77.85714 0.8834954
 > sum(example$weights)
[1] 13.00000

 > model<-lm(y~1,data=example,weights=weights)
 > add1(model,.~.+x1+x2)
Single term additions

Model:
y ~ 1
       Df Sum of Sq    RSS    AIC
<none>              94.000 27.719
x1      1    55.290 38.710 18.185
x2      1    58.630 35.371 17.012
 > addterm(model,.~.+x1+x2)
Single term additions

Model:
y ~ 1
       Df Sum of Sq    RSS    AIC
<none>              94.000 27.719
x1      1    55.290 38.710 18.185
x2      1    58.630 35.371 17.012

#so add1 and addterm (MASS package) give the same answer, nothing 
obviously untoward, but
#both SS and RSS change when you do...
 > model<-lm(y~x1,data=example,weights=weights)
 > drop1(model)
Single term deletions

Model:
y ~ x1
       Df Sum of Sq    RSS    AIC
<none>              30.164 14.942############I would expect RSS to be 38.710
x1      1    44.377 74.541 24.703############I would expect SS to be 
55.290 and RSS 94.000 #{3}#
 > model<-lm(y~x2,data=example,weights=weights)
 > drop1(model)
Single term deletions

Model:
y ~ x2
       Df Sum of Sq    RSS    AIC
<none>              37.922 17.918
x2      1    36.619 74.541 24.703 #{1}#

#not only are SS and RSS different, the relative importance of x1 and x2 
has swapped! I have checked that this
#does not happen if weights are not used (everything adds up as 
expected, but the null RSS and other SSs are not the same as any quoted 
here).
#The inconsistency is still there if you are not adding to a null model:
#NB I realise that x1 and x2 are correlated so whichever is last to be 
added appears to have a lower SS - this is not the issue
 > add1(model,.~.+x1+x2)#This or...
Single term additions

Model:
y ~ x2
       Df Sum of Sq    RSS    AIC
<none>              35.371 17.012
x1      1    18.576 16.794  9.329

 > addterm(model,.~x1+x2)# ...this is inconsistent with:
Single term additions

Model:
y ~ x2
       Df Sum of Sq    RSS    AIC
<none>              35.371 17.012
x1      1    18.576 16.794  9.329

 > drop1(update(model,.~.+x1))#this or...
Single term deletions

Model:
y ~ x2 + x1
       Df Sum of Sq    RSS    AIC
<none>              14.456  7.380
x2      1    15.708 30.164 14.942 #{4}#
x1      1    23.466 37.922 17.918 #{2}#

 > dropterm(update(model,.~.+x1))#...this
Single term deletions

Model:
y ~ x2 + x1
       Df Sum of Sq    RSS    AIC
<none>              14.456  7.380
x2      1    15.708 30.164 14.942
x1      1    23.466 37.922 17.918

#and the same thing happens with the x2 variable - compare below with above
 > add1(update(model,.~x1),.~x1+x2)
Single term additions

Model:
y ~ x1
       Df Sum of Sq    RSS    AIC
<none>              38.710 18.185
x2      1    21.916 16.794  9.329

 > addterm(update(model,.~x1),.~x1+x2)
Single term additions

Model:
y ~ x1
       Df Sum of Sq    RSS    AIC
<none>              38.710 18.185
x2      1    21.916 16.794  9.329

#Why do I think add1/addterm are the problem rather than drop1/dropterm?
#Because drop1/dropterm agree with the anova:

 > model<-lm(y~x2+x1,data=example,weights=weights)
 > anova(model)
Analysis of Variance Table

Response: y
          Df Sum Sq Mean Sq F value    Pr(>F)   
x2         1 36.619  36.619  25.331 0.0005119 *** ####this agrees with 
#{1}# above
x1         1 23.466  23.466  16.233 0.0024035 ** ####this agrees with 
#{2}# above
Residuals 10 14.456   1.446                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
 > model<-lm(y~x1+x2,data=example,weights=weights)
 > anova(model)
Analysis of Variance Table

Response: y
          Df Sum Sq Mean Sq F value    Pr(>F)   
x1         1 44.377  44.377  30.698 0.0002474 *** ####this agrees with 
#{3}# above
x2         1 15.708  15.708  10.866 0.0080633 ** ####this agrees with 
#{4}# above
Residuals 10 14.456   1.446                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


___________________________________________________________________________________________

My question is, why does this inconsistency happen? And is it safe to 
assume that anova and drop1/dropterm are giving me the answers I want, 
therefore to explore my model with these functions and avoid using 
add1/addterm?

HUGE thanks for reading to the end! I would be extremely grateful if 
someone could help me with this problem. I wasn't able to find any clues 
in the docs or the r-help archives (but perhaps as it's a complex 
problem I wasn't using the right search terms, if so, apologies)

Best wishes
Jenny

_____________
Jenny Hodgson
Department of Biology - area 18
PO box 373
University of York
YO10 5YW
UK
Tel: 01904 328623



More information about the R-help mailing list