[R] User error in calling predict/model.frame

Joshua Wiley jwiley.psych at gmail.com
Sat Jan 29 22:23:38 CET 2011


On Sat, Jan 29, 2011 at 1:05 PM, David Winsemius <dwinsemius at comcast.net> wrote:
>
> On Jan 29, 2011, at 3:49 PM, Joshua Wiley wrote:
>
>> On Sat, Jan 29, 2011 at 12:31 PM, David Winsemius
>> <dwinsemius at comcast.net> wrote:
>>>
>>> Huh?. With the same model and data, they should be the same:
>>
>> I must be missing something, because that is what I would have
>> expected too, but it is not what I get (at least when I run the code
>> as shown below).
>>
>>>> lm.mod2 <- lm(out ~ scale(xxA)*scale(xxB), data=dat)
>>>> newdata2 <- expand.grid(xxA=c(-1,0,1),xxB=c(-1,0,1))
>>>> newdata2$preds <- predict(lm.mod, newdata2)
>>
>>                                                  ^^^^^ is that lm.mod2?
>
> UnnHHH . lm.mod2 was the model from Zahn's posting.
>
> lm.mod <- lm(out ~ I(scale(xxA))*I(scale(xxB)), data=dat)
>
> So I guess you need the "I" to get the correct results with the scaled
> numbers, but this "works" with the unscaled numbers, although the numbers
> (meanA, meanB) are somewhat different, which deserves investigation. The
> pair (meanA, meanB) gives the same result but the others don't seem to get
> to the same numbers. Which is correct?
>
>>   meanA<-mean(dat$xxA); meanB<-mean(dat$xxB); sdA<-sd(dat$xxA);
>> sdB<-sd(dat$xxB)
>> newdata2 <-
>> expand.grid(xxA=c(meanA-sdA,meanA,meanA+sdA),xxB=c(meanB-sdB,meanB,meanB+sdB))
>> newdata2$preds <- predict(lm.mod2, newdata2)
>> newdata2
>       xxA      xxB    preds
> 1  9.304820 18.75866 222.1326
> 2  9.905043 18.75866 234.0992
> 3 10.505266 18.75866 246.0659
> 4  9.304820 19.73195 232.7828
> 5  9.905043 19.73195 244.8696
> 6 10.505266 19.73195 256.9565
> 7  9.304820 20.70524 243.4330
> 8  9.905043 20.70524 255.6400
> 9 10.505266 20.70524 267.8471
>

I may be completely barking up the wrong tree, but could it have
something to do with the interaction term being created from scaled
vs. unscaled terms?

> xxAB <- dat$xxA * dat$xxB
> meanAB <- meanA * meanB
> sdAB <- sqrt(sdA^2 + sdB^2 + meanA^2 * sdB^2 + meanB^2 * sdA^2)
> (xxAB - meanAB)/sdAB
 [1]  0.17672311 -0.93203381 -1.39257245 -1.31620813  0.08964000  0.71663046
 [7] -1.21984785 -0.39528381 -1.40027554  0.06304763  0.57713584  1.30481682
[13]  0.76070783  1.73067640  0.49481401 -0.22607119 -0.40356164 -0.70588375
[19]  1.34382743  0.65508728
> (sxxAB <- with(dat, scale(xxA) * scale(xxB))[1:20,])
 [1]  0.01838999  0.25899229 -0.15337178  1.07536361 -0.26520706  0.25119036
 [7] -0.11295999  0.05205466 -1.49676116 -0.14367368 -1.86921225  0.80860167
[13] -0.44816896  1.44122535 -0.73361472 -0.14930187 -1.46192081  0.19784309
[19]  0.62089372  0.0821291515

>
>>>> newdata2
>>>
>>> xxA xxB    preds
>>> 1  -1  -1 218.6366
>>> 2   0  -1 232.4330
>>> 3   1  -1 246.2295
>>> 4  -1   0 230.9129
>>> 5   0   0 244.8696
>>> 6   1   0 258.8263
>>> 7  -1   1 243.1892
>>> 8   0   1 257.3062
>>> 9   1   1 271.4232
>>>
>>>> Predict(rms.res,xxA=c(-1,0,1),xxB=c(-1,0,1), conf.int=FALSE)
>>>
>>> xxA xxB     yhat
>>> 1  -1  -1 218.6366
>>> 2   0  -1 232.4330
>>> 3   1  -1 246.2295
>>> 4  -1   0 230.9129
>>> 5   0   0 244.8696
>>> 6   1   0 258.8263
>>> 7  -1   1 243.1892
>>> 8   0   1 257.3062
>>> 9   1   1 271.4232
>>
>> This is copied directly from my console in a clean session.  I get
>> different values for the predicted values from predict(glm or lm) and
>> "yhat" from Predict(Glm).
>>
>>
>>> sessionInfo()
>>
>> R version 2.12.1 (2010-12-16)
>> Platform: i486-pc-linux-gnu (32-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>> [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>> [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>> [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>
>>> #################################
>>> require(rms)
>>
>> Loading required package: rms
>> Loading required package: Hmisc
>> Loading required package: survival
>> Loading required package: splines
>>
>> Attaching package: 'Hmisc'
>>
>> The following object(s) are masked from 'package:survival':
>>
>>  untangle.specials
>>
>> The following object(s) are masked from 'package:base':
>>
>>  format.pval, round.POSIXt, trunc.POSIXt, units
>>
>>
>> Attaching package: 'rms'
>>
>> The following object(s) are masked from 'package:survival':
>>
>>  Surv
>>
>>> set.seed(10)
>>> dat <- data.frame(xxA = rnorm(20, 10), xxB = rnorm(20, 20))
>>> dat$out <- with(dat, xxA+xxB+xxA*xxB+rnorm(20,20))
>>>
>>> rms.res <- Glm(out ~ scale(xxA) * scale(xxB), data=dat)
>>> glm.mod <- glm(out ~ scale(xxA) * scale(xxB), data=dat)
>>> lm.mod2 <- lm(out ~ scale(xxA)*scale(xxB), data=dat)
>>>
>>> newdata2 <- expand.grid(xxA = -1:1, xxB = -1:1)
>>> newdata2$preds1 <- predict(glm.mod, newdata2)
>>> newdata2$preds2 <- predict(lm.mod2, newdata2)
>>> newdata2
>>
>> xxA xxB   preds1   preds2
>> 1  -1  -1 143.3433 143.3433
>> 2   0  -1 131.6110 131.6110
>> 3   1  -1 119.8787 119.8787
>> 4  -1   0 137.1149 137.1149
>> 5   0   0 126.9708 126.9708
>> 6   1   0 116.8267 116.8267
>> 7  -1   1 130.8865 130.8865
>> 8   0   1 122.3306 122.3306
>> 9   1   1 113.7747 113.7747
>>>
>>> Predict(rms.res, xxA= -1:1, xxB= -1:1, conf.int=FALSE)
>>
>> xxA xxB     yhat
>> 1  -1  -1 212.2324
>> 2   0  -1 229.6472
>> 3   1  -1 247.0620
>> 4  -1   0 221.7795
>> 5   0   0 240.6413
>> 6   1   0 259.5030
>> 7  -1   1 231.3266
>> 8   0   1 251.6353
>> 9   1   1 271.9441
>>
>> Response variable (y): X * Beta
>>>
>>> ##################################
>
> David Winsemius, MD
> West Hartford, CT
>
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/



More information about the R-help mailing list