[R] Predictions with missing inputs

Bill.Venables at csiro.au Bill.Venables at csiro.au
Sat Feb 12 05:40:40 CET 2011


With R it is always possible to shoot yourself squarely in the foot, as you seem keen to do, but R does at least often make it difficult.

When you predict, you need to have values for ALL variables used in the model.  Just leaving out the coefficients corresponding to absent predictors is equivalent to assuming that those coefficients are zero, and there is no basis whatever for so assuming.  (In this constructed example things are different because the missing variable is a nonsense variable and the coefficient should be roughly zero, as it is, but in general that is not going to be the case.)

So you need to supply some value for each of the missing predictors if you are going to use the standard prediction tools.  An obvious plug is the mean of that variable in the training data, though more sophisticated alternatives would often be available.

Here is a suggestion for your case.

## fit some linear model to random data

x <- matrix(rnorm(100*3),100,3)
y <- sample(1:2, 100, replace = TRUE)
mydata <- data.frame(y, x)
library(splines)                            ## missing from your code.
mymodel <- lm(y ~ ns(X1, df = 3) + X2 + X3, data = mydata)
summary(mymodel)

## create new data with 1 missing input

mynewdata <- within(data.frame(matrix(rnorm(100*2), 100, 2)),  ## add in an X3
                                   X3 <- mean(mydata$X3))
mypred <- predict(mymodel, mynewdata)

________________________________________
From: r-help-bounces at r-project.org [r-help-bounces at r-project.org] On Behalf Of Axel Urbiz [axel.urbiz at gmail.com]
Sent: 12 February 2011 11:51
To: R-help at r-project.org
Subject: [R] Predictions with missing inputs

Dear users,

I'll appreciate your help with this (hopefully) simple problem.

I have a model object which was fitted to inputs X1, X2, X3. Now, I'd like
to use this object to make predictions on a new data set where only X1 and
X2 are available (just use the estimated coefficients for these variables in
making predictions and ignoring the coefficient on X3). Here's my attempt
but, of course, didn't work.

#fit some linear model to random data

x=matrix(rnorm(100*3),100,3)
y=sample(1:2,100,replace=TRUE)
mydata <- data.frame(y,x)
mymodel <- lm(y ~ ns(X1, df=3) + X2 + X3, data=mydata)
summary(mymodel)

#create new data with 1 missing input

mynewdata <- data.frame(matrix(rnorm(100*2),100,2))
mypred <- predict(mymodel, mynewdata)
Thanks in advance for your help!

Axel.

        [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list