[R] proper order of calls when estimating nested models

Sat Mar 24 20:36:30 CET 2012

Dear all
How should one proceed when estimating nested models containing
missing data. What I would like to do is to first estimate the model
with the control variables only, and then estimate the model
containing also the variables of interest. For example,
> summary(reg.a <- lm(IMC ~ STYLE + SEXE + AGE, imc))

Call:
lm(formula = IMC ~ STYLE + SEXE + AGE, data = imc)

Residuals:
    Min      1Q  Median      3Q     Max
-10.720  -2.428  -0.550   1.712  32.206

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) 24.85878    0.23407 106.202   <2e-16 ***
STYLE        0.09544    0.12057   0.792    0.429
SEXE        -1.82998    0.11490 -15.926   <2e-16 ***
AGE          0.45665    0.02839  16.087   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.746 on 5969 degrees of freedom
  (576 observations deleted due to missingness)
Multiple R-squared: 0.07776,	Adjusted R-squared: 0.07729
F-statistic: 167.8 on 3 and 5969 DF,  p-value: < 2.2e-16

> summary(reg.b <- update(reg.a, . ~ . + DEMANDEC + LATITUDEC + SUPSUPC))

Call:
lm(formula = IMC ~ STYLE + SEXE + AGE + DEMANDEC + LATITUDEC +
    SUPSUPC, data = imc)

Residuals:
    Min      1Q  Median      3Q     Max
-10.687  -2.424  -0.552   1.677  32.258

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) 24.91600    0.24010 103.773   <2e-16 ***
STYLE        0.10165    0.12455   0.816   0.4144
SEXE        -1.87195    0.11654 -16.063   <2e-16 ***
AGE          0.45543    0.02932  15.530   <2e-16 ***
DEMANDEC     0.22041    0.10960   2.011   0.0444 *
LATITUDEC   -0.27378    0.11106  -2.465   0.0137 *
SUPSUPC     -0.14235    0.08169  -1.743   0.0815 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.739 on 5785 degrees of freedom
  (757 observations deleted due to missingness)
Multiple R-squared: 0.08293,	Adjusted R-squared: 0.08197
F-statistic: 87.18 on 6 and 5785 DF,  p-value: < 2.2e-16

Note that both have been estimated on different samples. If I try the
following I get an error:
> anova(reg.a, reg.b)
Error in anova.lmlist(object, ...) :
  models were not all fitted to the same size of dataset

Should I reverse the order of calls? Should I always first estimate
the complete model, and then the model containing only the control
vars?

Regards
Liviu

-- 
Do you know how to read?
http://www.alienetworks.com/srtest.cfm
http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader
Do you know how to write?
http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail