[R] Linear Model and Missing Data in Predictors

Lorenzo Isella lorenzo.isella at gmail.com
Tue Mar 15 16:14:42 CET 2016


Dear All,
A situation that for sure happens very often: suppose you are in the
following situation

set.seed(1235)
x1 <- seq(30)
x2 <- c(rep(NA, 9), rnorm(19)+9, c(NA, NA))
x3 <- c(rnorm(17)-2, rep(NA, 13))

y <- exp(seq(1,5, length=30))


mm<-lm(y~x1+x2+x3)

i.e. you try a simple linear regression with multiple regressors
which exhibit some missing values.
This is what happens to me while working with some time series which I
use as regressors and whose missing values are padded with NAs.
lm, as a default, disregard the sets of incomplete observations and
therefore drops quite a lot of data.
Is there any way to circumvent this? I mean, is there a way to somehow
come up with a piecewise linear regression where, whenever possible,
all the 3 regressors are used but we switch to 1 or 2 when there are
missing data?
I say this because it is totally unfeasible to try to figure out the
values of the missing data in my regressors, but at the same time I
cannot restrict my model to the intersection of the non-NA values in
the 3 regressors. If this makes sense, do I have to code it myself or
is there any package which already implemented this?
Any suggestion is appreciated.
Cheers

Lorenzo



More information about the R-help mailing list