[R-sig-ME] need help with predicted values from lmer with newdata

Fri Jun 22 17:40:01 CEST 2012

Hi Paul,

I am not sure how to use C-c C-d with namespaces and unexported
objects.  I might suggest though that for the assignment, you do
something like:

assignInNamespace("predict.merMod", mypredict, envir =
as.environment("package:lme4"))

It is not an issue in this case, but it can be if mypredict calls
functions that are availabe from lme4 but not to functions in the
global environment.  I am not a big leaguer though so cannot give you
really definitive advice.

Regarding the shirt, give some free help to a local student (or
charity), and I will consider us even :)

Cheers,

Josh

On Fri, Jun 22, 2012 at 7:44 AM, Paul Johnson <pauljohn32 at gmail.com> wrote:
> Confirmed. Follow-up below about how you edit predict.merMod
>
> On Fri, Jun 22, 2012 at 4:19 AM, Joshua Wiley <jwiley.psych at gmail.com> wrote:
>> Hi Paul,
>>
>> I think this may be a bug in predict.merMod.  It is not terribly
>> elegant, but I suggest changing line 102 from:
>>
>> re_new <- unlist(re_List[m])
>>
>> to
>>
>> re_new <- as.vector(do.call("rbind", re_List[m]))
>
> Changing this resolves the anomaly I experience. If you mail me your
> shirt size & mailing address, I will follow through on the shirt.
> Official logo and all!
>
> Now a more practical question. This is OT to mixed models, but is on
> topic to Emacs and R and how-to revise lme4 objects to test variations
> like the one you propose.
>
> I'm using Emacs with ESS.  I learned in this email list that C-c C-d
> can open a function and revise it, and then load it.  This is the
> Emacs-ESS alternative to R's fix() function.
>
> I have more and more trouble revising objects because more code is
> written with non-exported functions and functions.
>
> In order to follow your advice and revise predict.merMod, the only
> thing I could think of was the following.
>
>> mypredict <- lme4:::predict.merMod
>
> Then I C-c C-d mypredict, and the re-load that, then change my usage
> from predict(m4, ...) to mypredict.
>
> This seemed like a childish way to get at it, I wondered how they are
> doing it in the big leagues.
>
> ?
>
> PJ
>
>
>>
>> I believe the issue with the current is that the random effects are
>> unlisted which essentially 'stacks' the vector, so you have the random
>> effects one at a time by effect, not by level (i.e., all intercepts,
>> then all slopes).  Later this is post multiplied by a matrix that is
>> sorted by level, the dimensions are correct so the matrices multiply
>> fine, but the results are not what I believe was intended.
>>
>> I cannot see a way to add a patch to R-forge or submit a bug report so
>> I cced Doug on this.
>>
>> Cheers,
>>
>> Josh
>>
>> On Thu, Jun 21, 2012 at 8:28 PM, Paul Johnson <pauljohn32 at gmail.com> wrote:
>>> Greetings.
>>>
>>> I'm getting some crazy results out of predict on an lmer model.  I've
>>> gone through
>>> this lots of times, and feel certain I'm missing some obvious mistake.
>>>  If you browse
>>> through and get to the bottom, there's some output that shows my predict output
>>> is just wrong, but I can't see why.
>>>
>>> I will send the first person with the right answer a new KU t-shirt in
>>> the size of
>>> your choice.  That's better than "thanks in advance". Isn't it?
>>>
>>> ## Paul Johnson <pauljohn at ku.edu>
>>> ## 2012-06-21
>>>
>>> ## I want to compare the "separate ols regressions on clusters"
>>> ## results with the mixed model estimates. I've run into
>>> ## trouble because the output of predict with a newdata object
>>> ## seems to be completely wrong.  I demonstrate "manual" calculations
>>> ## to try to convince you I understand what's going on (but
>>> ## peculiarity of predict output may make things appear otherwise).
>>>
>>> ## I pasted in some output at the bottom in case you just want to
>>> ## see where I think there is a problem.
>>>
>>> ## My system:
>>> ## > sessionInfo()
>>> ## R version 2.15.0 (2012-03-30)
>>> ## Platform: x86_64-pc-linux-gnu (64-bit)
>>>
>>> ## locale:
>>> ##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>> ##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>> ##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>> ##  [7] LC_PAPER=C                 LC_NAME=C
>>> ##  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>> ## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> ## attached base packages:
>>> ## [1] stats     graphics  grDevices utils     datasets  methods   base
>>>
>>> ## other attached packages:
>>> ## [1] lme4_0.999902344-0 Matrix_1.0-6       lattice_0.20-6     MASS_7.3-18
>>> ## loaded via a namespace (and not attached):
>>> ## [1] compiler_2.15.0 grid_2.15.0     minqa_1.2.1     nlme_3.1-104
>>> ## [5] splines_2.15.0  tools_2.15.0
>>>
>>>
>>> ## Topic: create multi-level data and use regression and lmer to
>>> ## estimate it.  This creates 3 different dependent variables,
>>> ## y1: ordinary "homoskedastic" regression (no grouping effect)
>>> ## y2: clustered "random intercepts"
>>> ## y3: clustered "random intercepts" and "random slope" for 1 predictor (x1)
>>>
>>> library(MASS)
>>> set.seed(1234)
>>>
>>> ## Step 1. Create a Data Frame.
>>>
>>> ## M respondents, N observations for each one.
>>> ## In a repeated measures context, this is called "longitudinal data".
>>> ## In cross sectional approach, this is M groups (classrooms) with
>>> ## several people in each)
>>>
>>> M <- 10
>>> N <- 30
>>>
>>> ## get M unique gray colors, will use later for plotting
>>> grays <- gray.colors(M)
>>>
>>> ## Standard deviation of error term at individual level
>>> STDE <- 48
>>>
>>> ## STDEM: standard deviation of clustered intercepts.
>>> ### In a longitudinal "repeated measures" exercise, this is an
>>> ## individual-level effect. In a cross section, this is a random
>>> ## classroom effect.
>>> STDEM <- 30
>>>
>>> ## STEx1: standard deviation of slopes across cluster units
>>> STDEx1 <- 5
>>>
>>> ## The true effects of b0, b1, b2, b3 in
>>> ## y = b0 + b1*x1 + b2*x2 + b3*x3
>>> bslopes <- c(0.2, 15.4, -0.2, 3)
>>>
>>> ## Now generate the data frame with x1, x2 and x3.
>>> ## Let's suppose the predictors are Multivariate Normal, with
>>> ## means (100,200,150), standard deviations (10, 20, 30), and
>>> ## intercorrelations of 0.4.  These can, of course, be adjusted
>>> ## for interesting variations.
>>>
>>> ## Mind, an "indicator" that says with which cluster an observation belongs.
>>> Mind <- 1:M %x% rep(1,N)
>>>
>>> means <- c(100, 200, 150)
>>> sds <- c(10, 20, 30)
>>> rho <- 0.4
>>> corr.mat <- matrix(c(1, rho, rho, rho, 1, rho, rho, rho, 1), nrow = 3)
>>> sigma <- diag(sds) %*% corr.mat %*% diag(sds)
>>> x.mat <- mvrnorm(n = N * M, mu = means, Sigma = sigma)
>>> dimnames(x.mat)[[2]] <- c("x1", "x2", "x3")
>>> ## Add an intercept on the front of x.mat
>>> x.mat <- cbind("(Intercept)"= 1, x.mat)
>>>
>>>
>>> ## Create a dependent variable that has no clustering effects.  This
>>> ## is an "ordinary" regression with slopes and random error designated
>>> ## above random noise
>>> y1 <- x.mat %*% bslopes + rnorm(M*N, m=0, s= STDE)
>>> dat <- data.frame(id=1:(N*M), Mind, y1, as.data.frame(x.mat))
>>> rm(Mind, y1, x.mat) ## cleanup workspace
>>>
>>> ## Layer on additive group level error in y2
>>> ## Add a group-level intercept.
>>> ## A vector of M disturbances is created, and each is
>>> ## added to all N case within its cluster
>>> dat$y2 <- dat$y1 +  rnorm(M, 0, STDEM) %x% rep(1, N)
>>>
>>> ## In y3, add in random slope effect
>>> ## are changing the slope for x1 from b1 to N(b1, STDEx1^2).
>>> ## Same as newb1 ~ b1 + N(0, STDEx1) * x1
>>> dat$y3 <- dat$y2 +   (rnorm(M, 0, STDEx1) %x% rep(1, N)) * dat$x1
>>>
>>> ## In the lme4 package, there is an "easy" tool to run M separate lm regressions
>>> library(lme4)
>>>
>>> ##100 separate regressions, with 3 predictors
>>> m3list <- lmList(y3 ~ x1 + x2 + x3 | Mind, data=dat, pool = FALSE)
>>>
>>>
>>> ## Now I see a pattern. I'll use the unique range on x1 for each cluster
>>> ## I think that makes the lines look more "in" the data.
>>> plot(y3 ~ x1, data=dat, col=grays[Mind], main = "lm on clusters")
>>> for( i in seq_along(m3list)){
>>>    m3mf <- model.frame(m3list[[i]]) #data set for group i
>>>    x1range <- range(m3mf$x1) ## use group-specific ranges this time
>>>    pgroup <- predict( m3list[[i]], newdata = data.frame(x1=x1range,
>>> x2=mean(m3mf$x2), x3=mean(m3mf$x3)))
>>>    lines(x1range, pgroup, col=grays[i])
>>> }
>>>
>>> ## Hmm. That data looks like it might be handy for comparison later on.
>>> ## Build a data frame of it. I've never thought of doing this work
>>> ## cluster-by-cluster before.
>>>
>>> m3newdat <- lapply(m3list, function(x) {
>>>    m3mf <- model.frame(x)
>>>    ndf = data.frame(x1=range(m3mf$x1), x2=mean(m3mf$x2), x3=mean(m3mf$x3))
>>>    ndf$m3pred <- predict(x, newdata = ndf)
>>>    ndf} )
>>> ## Smash the list of data frames together
>>> m3newdat <- do.call("rbind", m3newdat)
>>> ## Interesting!
>>> m3newdat[1:20, ]
>>> ## Better add a variable Mind. This looks stupid, but works.
>>> m3newdat$Mind <- as.integer( do.call("rbind",
>>> strsplit(row.names(m3newdat), split="\\.", perl=T))[ ,1] )
>>>
>>>
>>> ## Draw new graphs on a new device, so we can compare
>>> dev.new()
>>>
>>> ## The "right" model allows the random effects to be correlated.
>>> mm4 <- lmer( y3 ~ x1 + x2 + x3 + (x1 | Mind), data=dat)
>>> summary(mm4)
>>>
>>>
>>>
>>> mm4ranef <- ranef(mm4, postVar = TRUE) ## a ranef.mer object, a list
>>> that includes one data frame
>>>
>>>
>>> ### What do the predicted values mean from lmer?
>>> ### here they are. But what do they include?
>>> head(mm4pred <- predict(mm4))
>>> tail(mm4pred <- predict(mm4))
>>>
>>>
>>> ## I can't tell what those numbers are. Can't be sure from ?predict.merMod
>>> ## So I'm going to go through the step-by-step
>>> ## process of calculating predicted values. These should follow a
>>> formula like this
>>> ## for model 4. j is the "cluster" index.
>>> ##
>>> ##  j'th intercept                j'th slope
>>> ##  (b0 + raneff(intercept, j)) + (b1 + raneff(x1,j))*x1 + b2 x2 + b3 x3
>>>
>>> mm4b <- coef(summary(mm4))[ ,1]
>>> mm4vnames <- names(mm4b)
>>> mm4mm <- model.matrix(mm4) ## predictors
>>>
>>> ## Doing this bit by bit. First
>>> ## b0 + raneff(intercept, j)
>>> mm4inteffect <- mm4b["(Intercept)"] +  mm4ranef[[1]][dat$Mind, 1]
>>> mm4x1effect <- mm4mm[ , c("x1")] * (mm4b["x1"] + mm4ranef[[1]][dat$Mind, 2])
>>> mm4pred2 <- mm4inteffect + mm4x1effect +  mm4mm[ ,c("x2","x3") ] %*%
>>> mm4b[c("x2","x3")]
>>> head(mm4pred2)
>>> tail(mm4pred2)
>>>
>>> ## Aha! Those exactly match predict for mm4. So I understand what "predict" does
>>>
>>>
>>>
>>>
>>> ## Now, I want to run predict for some particular values of x1, x2, x3.
>>>
>>> ## Let's try running m3newdat through the predict function for mm4.
>>> m3newdat$mm4.3 <- predict(mm4, newdata = m3newdat)
>>> m3newdat
>>>
>>>
>>> ## Disaster. Predicted values completely out of whack. Lots of
>>> ## lines fall off the bottom of the graph.
>>>
>>> plot(y3 ~ x1, data=dat, col=grays[Mind], main="lmer mixed model predictions")
>>> by(m3newdat, m3newdat$Mind, function(x) { lines(x$x1, x$mm4.3,
>>> col=grays[x$Mind])})
>>>
>>>
>>> ## They don't match the manual calculations I perform here
>>> mm4b <- coef(summary(mm4))[ ,1]
>>> mm4vnames <- names(mm4b)
>>> mm4mm <- as.matrix(m3newdat)
>>>
>>> ##  b0 + raneff(intercept, j)
>>> mm4inteffect <- mm4b["(Intercept)"] + rep(mm4ranef[[1]][, 1], each=2)
>>> mm4x1effect <- mm4mm[ , c("x1")] * rep(mm4b["x1"] + mm4ranef[[1]][, 2], each=2)
>>> mm4manualpred2 <- mm4inteffect + mm4x1effect +  mm4mm[ ,c("x2","x3") ]
>>> %*%  mm4b[c("x2","x3")]
>>>
>>> mm4mm <- cbind(as.data.frame(mm4mm), "mm4manualpred" = mm4manualpred2)
>>> mm4mm
>>>
>>> ## Here's what I see when I do that. m3pred: the lmList predictions.
>>> ## mm4manualpred: my "manual" calculations of predicted values for lmer mm4
>>> ## mm4.3 output from predict for same "newdat" used with mm4manualpred.
>>>
>>> ## > mm4mm
>>> ##             x1       x2       x3   m3pred Mind      mm4.3 mm4manualpred
>>> ## 1.1   88.51074 196.0615 141.3859 1221.477    1  -434.7602      1239.152
>>> ## 1.2  110.88258 196.0615 141.3859 1483.957    1  -630.0463      1466.961
>>> ## 2.1   80.26167 194.1252 133.7265 1405.672    2  -614.3256      1388.751
>>> ## 2.2  117.08533 194.1252 133.7265 1845.499    2 -1051.4105      1866.505
>>> ## 3.1   80.84091 202.4517 154.3976 1302.205    3  3527.8154      1284.706
>>> ## 3.2  120.33775 202.4517 154.3976 1692.819    3  5037.6392      1717.210
>>> ## 4.1   68.96320 204.5145 151.6662 1321.927    4  5097.5261      1379.251
>>> ## 4.2  120.17029 204.5145 151.6662 2137.589    4  8559.2362      2108.628
>>> ## 5.1   84.85202 200.7799 153.1595 1821.135    5  4736.2240      1813.415
>>> ## 5.2  122.58730 200.7799 153.1595 2418.381    5  6641.2026      2426.855
>>> ## 6.1   84.60214 202.3324 153.2583 2015.183    6  1521.6831      2001.700
>>> ## 6.2  128.13418 202.3324 153.2583 2778.684    6  2086.4726      2799.053
>>> ## 7.1   74.73548 198.5761 155.8984 1780.196    7  1497.6761      1802.381
>>> ## 7.2  118.97636 198.5761 155.8984 2622.083    7  2127.8282      2603.626
>>> ## 8.1   83.40882 200.4991 152.0515 2342.043    8  1954.4592      2323.477
>>> ## 8.2  118.73997 200.4991 152.0515 3088.686    8  2601.6007      3105.182
>>> ## 9.1   82.48090 205.4736 152.2774 1979.852    9  2253.5650      1950.388
>>> ## 9.2  117.48279 205.4736 152.2774 2554.581    9  3027.9842      2587.319
>>> ## 10.1  84.77764 205.9689 152.3325 2256.434   10  2236.3346      2268.620
>>> ## 10.2 119.66417 205.9689 152.3325 3022.053   10  2980.1202      3012.406
>>>
>>>
>>> --
>>> Paul E. Johnson
>>> Professor, Political Science    Assoc. Director
>>> 1541 Lilac Lane, Room 504     Center for Research Methods
>>> University of Kansas               University of Kansas
>>> http://pj.freefaculty.org            http://quant.ku.edu
>>>
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>>
>>
>> --
>> Joshua Wiley
>> Ph.D. Student, Health Psychology
>> Programmer Analyst II, Statistical Consulting Group
>> University of California, Los Angeles
>> https://joshuawiley.com/
>
>
>
> --
> Paul E. Johnson
> Professor, Political Science    Assoc. Director
> 1541 Lilac Lane, Room 504     Center for Research Methods
> University of Kansas               University of Kansas
> http://pj.freefaculty.org            http://quant.ku.edu

-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/