[R] Predicting response from fitted linear model with incomplete new sample data

Wed Dec 18 19:18:08 CET 2013

I would like to predict a new response from a fitted linear model where the
new data is a single case with a missing value. My reading of the help on
predict() is inconclusive on whether this is possible.

Leaving out the missing value or setting it to NA both fail but differently,
see example code below.

> y <- runif(50)
> x1 <- rnorm(50)
> x2 <- rnorm(50)
> dat <- data.frame(y, x1, x2)
> mod <- lm(y~.,data=dat)
> summary(mod)

Call:
lm(formula = y ~ ., data = dat)
Residuals:
     Min       1Q   Median       3Q      Max 
-0.50467 -0.28997  0.01457  0.27970  0.47791 
Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.50098    0.04577  10.945  1.6e-14 ***
x1          -0.01762    0.04172  -0.422    0.675    
x2          -0.02753    0.04920  -0.560    0.578    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3177 on 47 degrees of freedom
Multiple R-squared:  0.009301,  Adjusted R-squared:  -0.03286 
F-statistic: 0.2206 on 2 and 47 DF,  p-value: 0.8028

> predict(mod, newdata=data.frame(x1=0.1, x2=0.3))   #OK as expected
        1 
0.4909624 

> predict(mod, newdata=data.frame(x1=0.1))  # x2 missing
Error in model.frame.default(Terms, newdata, na.action = na.action, xlev =
object$xlevels) : 
  variable lengths differ (found for 'x2')
In addition: Warning message:
'newdata' had 1 row but variables found have 50 rows 
> predict(mod, newdata=data.frame(x1=0.1, x2=NA))   #x2=NA
Error: variable 'x2' was fitted with type "numeric" but type "logical" was
supplied
>

Thanks
Chris