[R] User error in calling predict/model.frame

Sat Jan 29 02:59:26 CET 2011

Ista & r-help list,

I guess I left out the most important part of any question, my reason
for doing this.  I am interested in the predicted y value at the mean,
1 SD above of the mean, and 1 SD below the mean for each predictor.
Since I conducted my analysis with my IVs in the scale of Z (i.e. my
formula for lm.obj was out ~ scale(xxA)*scale(xxB)), I expected to be
able to define my predictors in newdata in terms of Z.  It seems like
predict should be the right function to achieve these aims.

I made several (major) errors in my initial example, though the nature
of the error that R produces is the same, this may have added
considerably to the confusion.  I apologize. A more concise example of
code that (I think) should work, but doesn't is:
   set.seed(10)
   dat <- data.frame(xxA = rnorm(20,10), xxB = rnorm(20,20))
   dat$out <- with(dat,xxA+xxB+xxA*xxB+rnorm(20,20))
   lm.res.scale <- lm(out ~ scale(xxA)*scale(xxB),data=dat)
   my.data <- lm.res.scale$model #load the data from the lm object
   newdata <- expand.grid(X1=c(-1,0,1),X2=c(-1,0,1))
   names(newdata) <- c("scale(xxA)","scale(xxB)")
   newdata$Y <- predict(lm.res.scale,newdata)

Using your solution:
   names(newdata) <- names(dat)[1:2]

As you say, your solution is doing something other than:

   coef(lm.res.scale)[1]+coef(lm.res.scale)[2]*newdata[,1]+coef(lm.res.scale)[3]*newdata[,2]+coef(lm.res.scale)[4]*newdata[,1]*newdata[,2]

I really want a solution that, in one step will provide values like
the above code. I think that should be exactly what predict() should
do (now that I fixed newdata so that it doesn't have the sd() scaling
factors).  I think my example code should be equivalent to:
   dat <- data.frame(xxA = rnorm(20,10), xxB = rnorm(20,20))
   dat$out <- with(dat,xxA+xxB+xxA*xxB+rnorm(20,20))
   #rescaling outside of lm
   X1 <- with(dat,as.vector(scale(xxA)))
   X2 <- with(dat,as.vector(scale(xxB)))
   y  <- with(dat,out)
   lm.res.correct <- lm(y~X1*X2)
   my.data <- lm.res.correct$model #load the data from the lm object
   newdata <- expand.grid(X1=c(-1,0,1),X2=c(-1,0,1))
   #No need to rename newdata as it matches my lm object already
   newdata$Y <- predict(lm.res.correct,newdata)

Notably, adjusting my formula to include as.vector() does not solve
the problem with my attempt to use predict() directly with newdata.

Thanks again,

Russell Pierce