[R] formatting data for predict()
Andrew Miles
rstuff.miles at gmail.com
Sun Sep 26 06:38:45 CEST 2010
I'm trying to get predicted probabilities out of a regression model,
but am having trouble with the "newdata" option in the predict()
function. Suppose I have a model with two independent variables, like
this:
y=rbinom(100, 1, .3)
x1=rbinom(100, 1, .5)
x2=rnorm(100, 3, 2)
mod=glm(y ~ x1 + x2, family=binomial)
I can then get the predicted probabilities for the two values of x1,
holding x2 constant at 0 like this:
p2=predict(mod, type="response", newdata=as.data.frame(cbind(x1, x2=0)))
unique(p2)
However, I am running regressions as part of a function I wrote, which
feeds in the independent variables to the regression in matrix form,
like this:
dat=cbind(x1, x2)
mod2=glm(y ~ dat, family=binomial)
The results are the same as in mod. Yet I cannot figure out how to
input information into the "newdata" option of predict() in order to
generate the same predicted probabilities as above. The same code as
above does not work:
p2a=predict(mod2, type="response", newdata=as.data.frame(cbind(x1,
x2=0)))
unique(p2a)
Nor does creating a data frame that has the names "datx1" and "datx2,"
which is how the variables appear if you run a summary() on mod2.
Looking at the model matrix of mod2 shows that the fitted model only
shows two variables, the dependent variable y and one independent
variable called "dat." It is as if my two variables x1 and x2 have
become two levels in a factor variable called "dat."
names(mod2$model)
My question is this: if I have a fitted model like mod2, how do I use
the "newdata" option in the predict function so that I can get the
predicted values I am after? I.E. how do I recreate a data frame with
one variable called "dat" that contains two levels which represent my
(modified) variables x1 and x2?
Thanks in advance!
Andrew Miles
Department of Sociology
Duke University
More information about the R-help
mailing list