[R-sig-ME] need help with predicted values from lmer with newdata

Fri Jun 22 16:44:55 CEST 2012

Confirmed. Follow-up below about how you edit predict.merMod

On Fri, Jun 22, 2012 at 4:19 AM, Joshua Wiley <jwiley.psych at gmail.com> wrote:
> Hi Paul,
>
> I think this may be a bug in predict.merMod.  It is not terribly
> elegant, but I suggest changing line 102 from:
>
> re_new <- unlist(re_List[m])
>
> to
>
> re_new <- as.vector(do.call("rbind", re_List[m]))

Changing this resolves the anomaly I experience. If you mail me your
shirt size & mailing address, I will follow through on the shirt.
Official logo and all!

Now a more practical question. This is OT to mixed models, but is on
topic to Emacs and R and how-to revise lme4 objects to test variations
like the one you propose.

I'm using Emacs with ESS.  I learned in this email list that C-c C-d
can open a function and revise it, and then load it.  This is the
Emacs-ESS alternative to R's fix() function.

I have more and more trouble revising objects because more code is
written with non-exported functions and functions.

In order to follow your advice and revise predict.merMod, the only
thing I could think of was the following.

> mypredict <- lme4:::predict.merMod

Then I C-c C-d mypredict, and the re-load that, then change my usage
from predict(m4, ...) to mypredict.

This seemed like a childish way to get at it, I wondered how they are
doing it in the big leagues.

?

PJ

>
> I believe the issue with the current is that the random effects are
> unlisted which essentially 'stacks' the vector, so you have the random
> effects one at a time by effect, not by level (i.e., all intercepts,
> then all slopes).  Later this is post multiplied by a matrix that is
> sorted by level, the dimensions are correct so the matrices multiply
> fine, but the results are not what I believe was intended.
>
> I cannot see a way to add a patch to R-forge or submit a bug report so
> I cced Doug on this.
>
> Cheers,
>
> Josh
>
> On Thu, Jun 21, 2012 at 8:28 PM, Paul Johnson <pauljohn32 at gmail.com> wrote:
>> Greetings.
>>
>> I'm getting some crazy results out of predict on an lmer model.  I've
>> gone through
>> this lots of times, and feel certain I'm missing some obvious mistake.
>>  If you browse
>> through and get to the bottom, there's some output that shows my predict output
>> is just wrong, but I can't see why.
>>
>> I will send the first person with the right answer a new KU t-shirt in
>> the size of
>> your choice.  That's better than "thanks in advance". Isn't it?
>>
>> ## Paul Johnson <pauljohn at ku.edu>
>> ## 2012-06-21
>>
>> ## I want to compare the "separate ols regressions on clusters"
>> ## results with the mixed model estimates. I've run into
>> ## trouble because the output of predict with a newdata object
>> ## seems to be completely wrong.  I demonstrate "manual" calculations
>> ## to try to convince you I understand what's going on (but
>> ## peculiarity of predict output may make things appear otherwise).
>>
>> ## I pasted in some output at the bottom in case you just want to
>> ## see where I think there is a problem.
>>
>> ## My system:
>> ## > sessionInfo()
>> ## R version 2.15.0 (2012-03-30)
>> ## Platform: x86_64-pc-linux-gnu (64-bit)
>>
>> ## locale:
>> ##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>> ##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>> ##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>> ##  [7] LC_PAPER=C                 LC_NAME=C
>> ##  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> ## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> ## attached base packages:
>> ## [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> ## other attached packages:
>> ## [1] lme4_0.999902344-0 Matrix_1.0-6       lattice_0.20-6     MASS_7.3-18
>> ## loaded via a namespace (and not attached):
>> ## [1] compiler_2.15.0 grid_2.15.0     minqa_1.2.1     nlme_3.1-104
>> ## [5] splines_2.15.0  tools_2.15.0
>>
>>
>> ## Topic: create multi-level data and use regression and lmer to
>> ## estimate it.  This creates 3 different dependent variables,
>> ## y1: ordinary "homoskedastic" regression (no grouping effect)
>> ## y2: clustered "random intercepts"
>> ## y3: clustered "random intercepts" and "random slope" for 1 predictor (x1)
>>
>> library(MASS)
>> set.seed(1234)
>>
>> ## Step 1. Create a Data Frame.
>>
>> ## M respondents, N observations for each one.
>> ## In a repeated measures context, this is called "longitudinal data".
>> ## In cross sectional approach, this is M groups (classrooms) with
>> ## several people in each)
>>
>> M <- 10
>> N <- 30
>>
>> ## get M unique gray colors, will use later for plotting
>> grays <- gray.colors(M)
>>
>> ## Standard deviation of error term at individual level
>> STDE <- 48
>>
>> ## STDEM: standard deviation of clustered intercepts.
>> ### In a longitudinal "repeated measures" exercise, this is an
>> ## individual-level effect. In a cross section, this is a random
>> ## classroom effect.
>> STDEM <- 30
>>
>> ## STEx1: standard deviation of slopes across cluster units
>> STDEx1 <- 5
>>
>> ## The true effects of b0, b1, b2, b3 in
>> ## y = b0 + b1*x1 + b2*x2 + b3*x3
>> bslopes <- c(0.2, 15.4, -0.2, 3)
>>
>> ## Now generate the data frame with x1, x2 and x3.
>> ## Let's suppose the predictors are Multivariate Normal, with
>> ## means (100,200,150), standard deviations (10, 20, 30), and
>> ## intercorrelations of 0.4.  These can, of course, be adjusted
>> ## for interesting variations.
>>
>> ## Mind, an "indicator" that says with which cluster an observation belongs.
>> Mind <- 1:M %x% rep(1,N)
>>
>> means <- c(100, 200, 150)
>> sds <- c(10, 20, 30)
>> rho <- 0.4
>> corr.mat <- matrix(c(1, rho, rho, rho, 1, rho, rho, rho, 1), nrow = 3)
>> sigma <- diag(sds) %*% corr.mat %*% diag(sds)
>> x.mat <- mvrnorm(n = N * M, mu = means, Sigma = sigma)
>> dimnames(x.mat)[[2]] <- c("x1", "x2", "x3")
>> ## Add an intercept on the front of x.mat
>> x.mat <- cbind("(Intercept)"= 1, x.mat)
>>
>>
>> ## Create a dependent variable that has no clustering effects.  This
>> ## is an "ordinary" regression with slopes and random error designated
>> ## above random noise
>> y1 <- x.mat %*% bslopes + rnorm(M*N, m=0, s= STDE)
>> dat <- data.frame(id=1:(N*M), Mind, y1, as.data.frame(x.mat))
>> rm(Mind, y1, x.mat) ## cleanup workspace
>>
>> ## Layer on additive group level error in y2
>> ## Add a group-level intercept.
>> ## A vector of M disturbances is created, and each is
>> ## added to all N case within its cluster
>> dat$y2 <- dat$y1 +  rnorm(M, 0, STDEM) %x% rep(1, N)
>>
>> ## In y3, add in random slope effect
>> ## are changing the slope for x1 from b1 to N(b1, STDEx1^2).
>> ## Same as newb1 ~ b1 + N(0, STDEx1) * x1
>> dat$y3 <- dat$y2 +   (rnorm(M, 0, STDEx1) %x% rep(1, N)) * dat$x1
>>
>> ## In the lme4 package, there is an "easy" tool to run M separate lm regressions
>> library(lme4)
>>
>> ##100 separate regressions, with 3 predictors
>> m3list <- lmList(y3 ~ x1 + x2 + x3 | Mind, data=dat, pool = FALSE)
>>
>>
>> ## Now I see a pattern. I'll use the unique range on x1 for each cluster
>> ## I think that makes the lines look more "in" the data.
>> plot(y3 ~ x1, data=dat, col=grays[Mind], main = "lm on clusters")
>> for( i in seq_along(m3list)){
>>    m3mf <- model.frame(m3list[[i]]) #data set for group i
>>    x1range <- range(m3mf$x1) ## use group-specific ranges this time
>>    pgroup <- predict( m3list[[i]], newdata = data.frame(x1=x1range,
>> x2=mean(m3mf$x2), x3=mean(m3mf$x3)))
>>    lines(x1range, pgroup, col=grays[i])
>> }
>>
>> ## Hmm. That data looks like it might be handy for comparison later on.
>> ## Build a data frame of it. I've never thought of doing this work
>> ## cluster-by-cluster before.
>>
>> m3newdat <- lapply(m3list, function(x) {
>>    m3mf <- model.frame(x)
>>    ndf = data.frame(x1=range(m3mf$x1), x2=mean(m3mf$x2), x3=mean(m3mf$x3))
>>    ndf$m3pred <- predict(x, newdata = ndf)
>>    ndf} )
>> ## Smash the list of data frames together
>> m3newdat <- do.call("rbind", m3newdat)
>> ## Interesting!
>> m3newdat[1:20, ]
>> ## Better add a variable Mind. This looks stupid, but works.
>> m3newdat$Mind <- as.integer( do.call("rbind",
>> strsplit(row.names(m3newdat), split="\\.", perl=T))[ ,1] )
>>
>>
>> ## Draw new graphs on a new device, so we can compare
>> dev.new()
>>
>> ## The "right" model allows the random effects to be correlated.
>> mm4 <- lmer( y3 ~ x1 + x2 + x3 + (x1 | Mind), data=dat)
>> summary(mm4)
>>
>>
>>
>> mm4ranef <- ranef(mm4, postVar = TRUE) ## a ranef.mer object, a list
>> that includes one data frame
>>
>>
>> ### What do the predicted values mean from lmer?
>> ### here they are. But what do they include?
>> head(mm4pred <- predict(mm4))
>> tail(mm4pred <- predict(mm4))
>>
>>
>> ## I can't tell what those numbers are. Can't be sure from ?predict.merMod
>> ## So I'm going to go through the step-by-step
>> ## process of calculating predicted values. These should follow a
>> formula like this
>> ## for model 4. j is the "cluster" index.
>> ##
>> ##  j'th intercept                j'th slope
>> ##  (b0 + raneff(intercept, j)) + (b1 + raneff(x1,j))*x1 + b2 x2 + b3 x3
>>
>> mm4b <- coef(summary(mm4))[ ,1]
>> mm4vnames <- names(mm4b)
>> mm4mm <- model.matrix(mm4) ## predictors
>>
>> ## Doing this bit by bit. First
>> ## b0 + raneff(intercept, j)
>> mm4inteffect <- mm4b["(Intercept)"] +  mm4ranef[[1]][dat$Mind, 1]
>> mm4x1effect <- mm4mm[ , c("x1")] * (mm4b["x1"] + mm4ranef[[1]][dat$Mind, 2])
>> mm4pred2 <- mm4inteffect + mm4x1effect +  mm4mm[ ,c("x2","x3") ] %*%
>> mm4b[c("x2","x3")]
>> head(mm4pred2)
>> tail(mm4pred2)
>>
>> ## Aha! Those exactly match predict for mm4. So I understand what "predict" does
>>
>>
>>
>>
>> ## Now, I want to run predict for some particular values of x1, x2, x3.
>>
>> ## Let's try running m3newdat through the predict function for mm4.
>> m3newdat$mm4.3 <- predict(mm4, newdata = m3newdat)
>> m3newdat
>>
>>
>> ## Disaster. Predicted values completely out of whack. Lots of
>> ## lines fall off the bottom of the graph.
>>
>> plot(y3 ~ x1, data=dat, col=grays[Mind], main="lmer mixed model predictions")
>> by(m3newdat, m3newdat$Mind, function(x) { lines(x$x1, x$mm4.3,
>> col=grays[x$Mind])})
>>
>>
>> ## They don't match the manual calculations I perform here
>> mm4b <- coef(summary(mm4))[ ,1]
>> mm4vnames <- names(mm4b)
>> mm4mm <- as.matrix(m3newdat)
>>
>> ##  b0 + raneff(intercept, j)
>> mm4inteffect <- mm4b["(Intercept)"] + rep(mm4ranef[[1]][, 1], each=2)
>> mm4x1effect <- mm4mm[ , c("x1")] * rep(mm4b["x1"] + mm4ranef[[1]][, 2], each=2)
>> mm4manualpred2 <- mm4inteffect + mm4x1effect +  mm4mm[ ,c("x2","x3") ]
>> %*%  mm4b[c("x2","x3")]
>>
>> mm4mm <- cbind(as.data.frame(mm4mm), "mm4manualpred" = mm4manualpred2)
>> mm4mm
>>
>> ## Here's what I see when I do that. m3pred: the lmList predictions.
>> ## mm4manualpred: my "manual" calculations of predicted values for lmer mm4
>> ## mm4.3 output from predict for same "newdat" used with mm4manualpred.
>>
>> ## > mm4mm
>> ##             x1       x2       x3   m3pred Mind      mm4.3 mm4manualpred
>> ## 1.1   88.51074 196.0615 141.3859 1221.477    1  -434.7602      1239.152
>> ## 1.2  110.88258 196.0615 141.3859 1483.957    1  -630.0463      1466.961
>> ## 2.1   80.26167 194.1252 133.7265 1405.672    2  -614.3256      1388.751
>> ## 2.2  117.08533 194.1252 133.7265 1845.499    2 -1051.4105      1866.505
>> ## 3.1   80.84091 202.4517 154.3976 1302.205    3  3527.8154      1284.706
>> ## 3.2  120.33775 202.4517 154.3976 1692.819    3  5037.6392      1717.210
>> ## 4.1   68.96320 204.5145 151.6662 1321.927    4  5097.5261      1379.251
>> ## 4.2  120.17029 204.5145 151.6662 2137.589    4  8559.2362      2108.628
>> ## 5.1   84.85202 200.7799 153.1595 1821.135    5  4736.2240      1813.415
>> ## 5.2  122.58730 200.7799 153.1595 2418.381    5  6641.2026      2426.855
>> ## 6.1   84.60214 202.3324 153.2583 2015.183    6  1521.6831      2001.700
>> ## 6.2  128.13418 202.3324 153.2583 2778.684    6  2086.4726      2799.053
>> ## 7.1   74.73548 198.5761 155.8984 1780.196    7  1497.6761      1802.381
>> ## 7.2  118.97636 198.5761 155.8984 2622.083    7  2127.8282      2603.626
>> ## 8.1   83.40882 200.4991 152.0515 2342.043    8  1954.4592      2323.477
>> ## 8.2  118.73997 200.4991 152.0515 3088.686    8  2601.6007      3105.182
>> ## 9.1   82.48090 205.4736 152.2774 1979.852    9  2253.5650      1950.388
>> ## 9.2  117.48279 205.4736 152.2774 2554.581    9  3027.9842      2587.319
>> ## 10.1  84.77764 205.9689 152.3325 2256.434   10  2236.3346      2268.620
>> ## 10.2 119.66417 205.9689 152.3325 3022.053   10  2980.1202      3012.406
>>
>>
>> --
>> Paul E. Johnson
>> Professor, Political Science    Assoc. Director
>> 1541 Lilac Lane, Room 504     Center for Research Methods
>> University of Kansas               University of Kansas
>> http://pj.freefaculty.org            http://quant.ku.edu
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
>
>
> --
> Joshua Wiley
> Ph.D. Student, Health Psychology
> Programmer Analyst II, Statistical Consulting Group
> University of California, Los Angeles
> https://joshuawiley.com/

-- 
Paul E. Johnson
Professor, Political Science    Assoc. Director
1541 Lilac Lane, Room 504     Center for Research Methods
University of Kansas               University of Kansas
http://pj.freefaculty.org            http://quant.ku.edu