[R] proper order of calls when estimating nested models
Liviu Andronic
landronimirc at gmail.com
Sat Mar 24 20:36:30 CET 2012
Dear all
How should one proceed when estimating nested models containing
missing data. What I would like to do is to first estimate the model
with the control variables only, and then estimate the model
containing also the variables of interest. For example,
> summary(reg.a <- lm(IMC ~ STYLE + SEXE + AGE, imc))
Call:
lm(formula = IMC ~ STYLE + SEXE + AGE, data = imc)
Residuals:
Min 1Q Median 3Q Max
-10.720 -2.428 -0.550 1.712 32.206
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 24.85878 0.23407 106.202 <2e-16 ***
STYLE 0.09544 0.12057 0.792 0.429
SEXE -1.82998 0.11490 -15.926 <2e-16 ***
AGE 0.45665 0.02839 16.087 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.746 on 5969 degrees of freedom
(576 observations deleted due to missingness)
Multiple R-squared: 0.07776, Adjusted R-squared: 0.07729
F-statistic: 167.8 on 3 and 5969 DF, p-value: < 2.2e-16
> summary(reg.b <- update(reg.a, . ~ . + DEMANDEC + LATITUDEC + SUPSUPC))
Call:
lm(formula = IMC ~ STYLE + SEXE + AGE + DEMANDEC + LATITUDEC +
SUPSUPC, data = imc)
Residuals:
Min 1Q Median 3Q Max
-10.687 -2.424 -0.552 1.677 32.258
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 24.91600 0.24010 103.773 <2e-16 ***
STYLE 0.10165 0.12455 0.816 0.4144
SEXE -1.87195 0.11654 -16.063 <2e-16 ***
AGE 0.45543 0.02932 15.530 <2e-16 ***
DEMANDEC 0.22041 0.10960 2.011 0.0444 *
LATITUDEC -0.27378 0.11106 -2.465 0.0137 *
SUPSUPC -0.14235 0.08169 -1.743 0.0815 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.739 on 5785 degrees of freedom
(757 observations deleted due to missingness)
Multiple R-squared: 0.08293, Adjusted R-squared: 0.08197
F-statistic: 87.18 on 6 and 5785 DF, p-value: < 2.2e-16
Note that both have been estimated on different samples. If I try the
following I get an error:
> anova(reg.a, reg.b)
Error in anova.lmlist(object, ...) :
models were not all fitted to the same size of dataset
Should I reverse the order of calls? Should I always first estimate
the complete model, and then the model containing only the control
vars?
Regards
Liviu
--
Do you know how to read?
http://www.alienetworks.com/srtest.cfm
http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader
Do you know how to write?
http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail
More information about the R-help
mailing list