[R] Predicting and Plotting "hypothetical" values of factors

Gabor Grothendieck ggrothendieck at gmail.com
Thu Nov 17 19:35:24 CET 2005


On 17 Nov 2005 18:00:45 +0100, Peter Dalgaard <p.dalgaard at biostat.ku.dk> wrote:
> Paul Johnson <pauljohn at ku.edu> writes:
>
> > Last Friday, I noticed that it is difficult to work with regression
> > models in which there are factors.  It is easier to do the old fashioned
> > thing of coding up "dummy" variables with 0-1 values.  The predict
> > function's newdata argument is not suited to insertion of hypothetical
> > values for the factor, whereas it has no trouble with numeric variables.
> >   For example, if one uses a factor as a predictor in a logistic
> > regression, then it is tough to plot the S-shaped curve that describes
> > the probabilities.
> >
> > Here is some code with comments describing the difficulties that we have
> > found.  Are there simpler, less frustrating approaches?
>
> I think the point is that if you think a factor can take intermediate
> values in between the groups (as in sex==1.245), then it is really a
> numeric variable and should be treated as such in the model, whether
> or not it has only two distinct values in your data set.
>
> It is not obvious to me what the "S shaped curve" means in a model
> which really only specifies two probabilities.
>
>        -p
>
>
> > -----------------------------
> > # Paul Johnson  <pauljohn at ku.edu> 2005-11-17
> > # factorFutzing-1.R
> >
> >
> > myfac <- factor(c(1.1, 4.5, 1.1, 1.1, 4.5, 1.1, 4.5, 1.1))
> >
> > y <- c(0,1,0,1,0,0,1,0)
> >
> > mymod1 <-glm (y~myfac, family=binomial)
> >
> > p1 <- predict(mymod1, type="response")
> >
> > plot(myfac,p1)
> >
> > # Wait, that's not the picture I wanted. I want to see the
> > # proverbial S-shaped curve!
> > #
> >
> > # The contrast variable that R creates is coded 0 and 1.
> > # I'd like to have a sequence from 0 to 1 (seq(0,1,length.out=10)) and
> > # get predicted values for each.  That would give the S-shaped curve.
> >
> >
> > # But the use of predict does not work because the factor has 2
> > # values and the predict function won't take my new data:
> >
> > predict(mymod1, newdata=data.frame(myfac=seq(0,1,length.out=8)))
> >
> > # Error: variable 'myfac' was fitted with class "factor" but class
> > "numeric" was supplied
> > # In addition: Warning message:
> > # variable 'myfac' is not a factor in: model.frame.default(Terms,
> > newdata, na.action = na.action, xlev = object$xlevels)
> >
> > # Isn't there an easier way than this?
> >
> > c1 <- coef(mymod1)
> >
> > myseq <- seq(0,1, length.out=10)
> >
> > newdf <- data.frame(1, myseq)
> >
> > linearPredictor <- as.matrix(newdf) %*% c1
> >
> > p2 <- 1/ (1 + exp(-linearPredictor))
> > # But there is a big disaster if you just try the obvious thing
> > # of plotting with
> > # lines(myseq,p2)
> > # The line does not show up where you hope in the plot.
> > # The plot "appears to" have the x-axis on the scale
> > # 1.1 to 4.5, So in order to see the s-shaped curve, it appears we have
> > # to re-scale. However, this is a big illusion.  To see what I mean,
> > # do
> >
> > points(2,.4)
> >
> > # you expected to see the point appear at (2,.4), but in the plot
> > # it appears at (4.5,.4).  Huh?
> >
> > # The actual values being plotted are the integer-valued levels that
> > # R uses for factors, the numbers you get from as.numeric(myfac).
> > # So it is only necessary to re-scale the sequence by adding one.
> >
> > myseq2 <- 1 +  myseq
> >
> > lines( myseq2, p2, type="l" )
> >
> > #Its not all that S-shaped, but you have to take what you can get! :)


Perhaps still too complicated but if interp is
an interpolation function and xx are the two levels and yy
are the corresponding linear predictors (not the responses)
then plot xx interpolated vs the inverse link of yy
interpolated:

# interpolation function
interp <- function(x, length = 100) seq(x[1], x[2], length = length)

# levels and predictions (on linear predictor scale) from levels
xx <- c(1.1, 4.5)
yy <- predict(mymod1, newdata = list(myfac = factor(levs)))

plot(interp(xx), family(mymod1)$linkinv(interp(yy)))




More information about the R-help mailing list