[Rd] lm considers removed predictors when finding complete cases
David Winsemius
dwinsemius at comcast.net
Wed Dec 20 01:22:57 CET 2017
> On Dec 19, 2017, at 11:12 AM, EDUARDO GARCIA PORTUGUES <edgarcia at est-econ.uc3m.es> wrote:
>
> Dear R-devel list,
>
> I realized that removing a predictor in lm through the "-"'s operator in
> formula() does not affect the complete cases that are considered. A minimal
> example is:
>
> summary(lm(Wind ~ ., data = airquality))
> # 42 observations deleted due to missingness
>
> summary(lm(Wind ~ . - Ozone, data = airquality))
> # still 42 observations deleted due to missingness, even if only 7 are
> # missing for the response and the rest of the predictors
>
> summary(lm(Wind ~ ., data = subset(airquality, select = -Ozone)))
> # 7 observations deleted due to missingness
>
> I find this behaviour somehow striking and I was wondering whether it is
> intended, or whether it would be appropriate to document it in lm's help.
The behavior in the second instance seems consistent with a desire to compare models (full versus reduced) based on the same data. You expectation appears to be something else but you have not really explained your rationale for a different expectation other than to call it "striking". If by "striking" you mean hitting your head and saying "Oh course, I should have thought of that" then we would be in agreement.
--
David.
>
> Any insight on this issue is appreciated.
>
> Best regards,
> --
> Eduardo García Portugués
> Assistant professor
> Department of Statistics
> Carlos III University of Madrid
>
> Office: 7.3.J21 (Leganés)
> Phone: (+34) 91624 8836
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
David Winsemius
Alameda, CA, USA
'Any technology distinguishable from magic is insufficiently advanced.' -Gehm's Corollary to Clarke's Third Law
More information about the R-devel
mailing list