[R] Predictably puzzled.
Rolf Turner
r@turner @end|ng |rom @uck|@nd@@c@nz
Sat Nov 20 03:12:14 CET 2021
Consider the following toy example:
set.seed(42)
y <- rnorm(20)
x <- rnorm(20)
y[c(3,5,14,15)] <- NA
fit <- lm(y~x)
predict(fit)
This for some reason, which escapes me, does not provide predicted
values when the response/dependent variable is missing, despite
there being no missing values in the predictor/independent variable.
I can get predicted values for all values of x if I set
ddd <- data.frame(y=y,x=x)
and execute
predict(fit,newdata=ddd)
Note that y is (unnecessarily) included in ddd. I thought that
predict() might omit any rows of the data in which there are missing
values, but not so.
OK. I have a workaround which gives me the predicted values that I
want, but:
(a) Why does predict() behave in this way? It makes no sense to me,
but I figure there *must* be a rationale.
(b) Is there a way of getting predict() to behave as I would like, by
specifying an appropriate value for na.action? I could not find such
an appropriate value.
Thanks for any enlightenment.
cheers,
Rolf Turner
--
Honorary Research Fellow
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276
More information about the R-help
mailing list