[R] Predicting and Plotting "hypothetical" values of factors
Paul Johnson
pauljohn at ku.edu
Thu Nov 17 17:47:25 CET 2005
Last Friday, I noticed that it is difficult to work with regression
models in which there are factors. It is easier to do the old fashioned
thing of coding up "dummy" variables with 0-1 values. The predict
function's newdata argument is not suited to insertion of hypothetical
values for the factor, whereas it has no trouble with numeric variables.
For example, if one uses a factor as a predictor in a logistic
regression, then it is tough to plot the S-shaped curve that describes
the probabilities.
Here is some code with comments describing the difficulties that we have
found. Are there simpler, less frustrating approaches?
-----------------------------
# Paul Johnson <pauljohn at ku.edu> 2005-11-17
# factorFutzing-1.R
myfac <- factor(c(1.1, 4.5, 1.1, 1.1, 4.5, 1.1, 4.5, 1.1))
y <- c(0,1,0,1,0,0,1,0)
mymod1 <-glm (y~myfac, family=binomial)
p1 <- predict(mymod1, type="response")
plot(myfac,p1)
# Wait, that's not the picture I wanted. I want to see the
# proverbial S-shaped curve!
#
# The contrast variable that R creates is coded 0 and 1.
# I'd like to have a sequence from 0 to 1 (seq(0,1,length.out=10)) and
# get predicted values for each. That would give the S-shaped curve.
# But the use of predict does not work because the factor has 2
# values and the predict function won't take my new data:
predict(mymod1, newdata=data.frame(myfac=seq(0,1,length.out=8)))
# Error: variable 'myfac' was fitted with class "factor" but class
"numeric" was supplied
# In addition: Warning message:
# variable 'myfac' is not a factor in: model.frame.default(Terms,
newdata, na.action = na.action, xlev = object$xlevels)
# Isn't there an easier way than this?
c1 <- coef(mymod1)
myseq <- seq(0,1, length.out=10)
newdf <- data.frame(1, myseq)
linearPredictor <- as.matrix(newdf) %*% c1
p2 <- 1/ (1 + exp(-linearPredictor))
# But there is a big disaster if you just try the obvious thing
# of plotting with
# lines(myseq,p2)
# The line does not show up where you hope in the plot.
# The plot "appears to" have the x-axis on the scale
# 1.1 to 4.5, So in order to see the s-shaped curve, it appears we have
# to re-scale. However, this is a big illusion. To see what I mean,
# do
points(2,.4)
# you expected to see the point appear at (2,.4), but in the plot
# it appears at (4.5,.4). Huh?
# The actual values being plotted are the integer-valued levels that
# R uses for factors, the numbers you get from as.numeric(myfac).
# So it is only necessary to re-scale the sequence by adding one.
myseq2 <- 1 + myseq
lines( myseq2, p2, type="l" )
#Its not all that S-shaped, but you have to take what you can get! :)
-----------------------------
--
Paul E. Johnson email: pauljohn at ku.edu
Dept. of Political Science http://lark.cc.ku.edu/~pauljohn
1541 Lilac Lane, Rm 504
University of Kansas Office: (785) 864-9086
Lawrence, Kansas 66044-3177 FAX: (785) 864-5700
More information about the R-help
mailing list