[R] Issue with predict() for glm models

jrausch@nd.edu jrausch at nd.edu
Wed Sep 22 19:52:37 CEST 2004


Hello everyone, 

I am having a problem using the predict (or the predict.glm) function in R.
Basically, I run the glm model on a "training" data set and try to obtain
predictions for a set of new predictors from a "test" data set (i.e., not the
predictors that were utilized to obtain the glm parameter estimates).
Unfortunately, every time that I attempt this, I obtain the predictions for the
predictors that were used to fit the glm model. I have looked at the R mailing
list archives and don't believe I am making the same mistakes that have been
made in the past and also have tried to closely follow the predict.glm example
in the help file. Here is an example of what I am trying to do: 

########################################################
set.seed(545345)

################
# Necessary Variables # 
################

p <- 2
train.n <- 20
test.n <- 25 
mean.vec.1 <- c(1,1)
mean.vec.2 <- c(0,0)

Sigma.1 <- matrix(c(1,.5,.5,1),p,p)
Sigma.2 <- matrix(c(1,.5,.5,1),p,p)

###############
# Load MASS Library #
###############

library(MASS)

###################################
# Data to Parameters for Logistic Regression Model #
###################################

train.data.1 <- mvrnorm(train.n,mu=mean.vec.1,Sigma=Sigma.1)
train.data.2 <- mvrnorm(train.n,mu=mean.vec.2,Sigma=Sigma.2)
train.class.var <- as.factor(c(rep(1,train.n),rep(2,train.n)))
predictors.train <- rbind(train.data.1,train.data.2)

##############################################
# Test Data Where Predictions for Probabilities Using Logistic Reg.  #
# From Training Data are of Interest                                          #
############################################## 

test.data.1 <- mvrnorm(test.n,mu=mean.vec.1,Sigma=Sigma.1)
test.data.2 <- mvrnorm(test.n,mu=mean.vec.2,Sigma=Sigma.2)
predictors.test <- rbind(test.data.1,test.data.2)

##############################
# Run Logistic Regression on Training Data #
##############################

log.reg <- glm(train.class.var~predictors.train,
family=binomial(link="logit"))
log.reg

#> log.reg

#Call:  glm(formula = train.class.var ~ predictors.train, family =
#binomial(link = "logit")) 
#
#Coefficients:
#      (Intercept)  predictors.train1  predictors.train2  
#           0.5105            -0.2945            -1.0811  
#
#Degrees of Freedom: 39 Total (i.e. Null);  37 Residual
#Null Deviance:      55.45 
#Residual Deviance: 41.67        AIC: 47.67 

###########################
# Predicted Probabilities for Test Data #
###########################

New.Data <- data.frame(predictors.train1=predictors.test[,1],
predictors.train2=predictors.test[,2])

logreg.pred.prob.test <- predict.glm(log.reg,New.Data,type="response")
logreg.pred.prob.test

#logreg.pred.prob.test
# [1] 0.51106406 0.15597423 0.04948404 0.03863875 0.35587589 0.71331091
# [7] 0.17320087 0.14176632 0.30966718 0.61878952 0.12525988 0.21271139
#[13] 0.70068113 0.18340723 0.10295501 0.44591568 0.72285161 0.31499339
#[19] 0.65789420 0.42750139 0.14435889 0.93008117 0.70798465 0.80109005
#[25] 0.89161472 0.47480625 0.56520952 0.63981834 0.57595189 0.60075882
#[31] 0.96493393 0.77015507 0.87643986 0.62973986 0.63043351 0.45398955
#[37] 0.80855782 0.90835588 0.54809117 0.11568637
########################################################

Of course, notice that the vector for the predicted probabilities has only 40
elements, while the "New.Data" has 50 elements (since n.test has 25 per group
for 2 groups) and thus should have 50 predicted probabilities. As it turns out,
the output is for the training data predictors and not for the "New.Data" as I
would like it to be. I should also note that I have made sure that the names
for the predictors in the "New.Data" are the same as the names for the
predictors within the glm object (i.e., within "log.reg") as this is what is
done in the example for predict.glm() within the help files. 

Could some one help me understand either what I am doing incorrectly or what
problems there might be within the predict() function? I should mention that I
tried the same program using predict.glm() and obtained the same problematic
results. 

Thanks and take care, 

Joe 


Joe Rausch, M.A. 
Psychology Liaison 
Lab for Social Research 
917 Flanner Hall 
University of Notre Dame 
Notre Dame, IN 46556
(574) 631-3910

"If we knew what it was we were doing, it would not be called research, would
it?"
- Albert Einstein




More information about the R-help mailing list