[R-sig-ME] need help with predicted values from lmer with newdata
Joshua Wiley
jwiley.psych at gmail.com
Fri Jun 22 17:40:01 CEST 2012
Hi Paul,
I am not sure how to use C-c C-d with namespaces and unexported
objects. I might suggest though that for the assignment, you do
something like:
assignInNamespace("predict.merMod", mypredict, envir =
as.environment("package:lme4"))
It is not an issue in this case, but it can be if mypredict calls
functions that are availabe from lme4 but not to functions in the
global environment. I am not a big leaguer though so cannot give you
really definitive advice.
Regarding the shirt, give some free help to a local student (or
charity), and I will consider us even :)
Cheers,
Josh
On Fri, Jun 22, 2012 at 7:44 AM, Paul Johnson <pauljohn32 at gmail.com> wrote:
> Confirmed. Follow-up below about how you edit predict.merMod
>
> On Fri, Jun 22, 2012 at 4:19 AM, Joshua Wiley <jwiley.psych at gmail.com> wrote:
>> Hi Paul,
>>
>> I think this may be a bug in predict.merMod. It is not terribly
>> elegant, but I suggest changing line 102 from:
>>
>> re_new <- unlist(re_List[m])
>>
>> to
>>
>> re_new <- as.vector(do.call("rbind", re_List[m]))
>
> Changing this resolves the anomaly I experience. If you mail me your
> shirt size & mailing address, I will follow through on the shirt.
> Official logo and all!
>
> Now a more practical question. This is OT to mixed models, but is on
> topic to Emacs and R and how-to revise lme4 objects to test variations
> like the one you propose.
>
> I'm using Emacs with ESS. I learned in this email list that C-c C-d
> can open a function and revise it, and then load it. This is the
> Emacs-ESS alternative to R's fix() function.
>
> I have more and more trouble revising objects because more code is
> written with non-exported functions and functions.
>
> In order to follow your advice and revise predict.merMod, the only
> thing I could think of was the following.
>
>> mypredict <- lme4:::predict.merMod
>
> Then I C-c C-d mypredict, and the re-load that, then change my usage
> from predict(m4, ...) to mypredict.
>
> This seemed like a childish way to get at it, I wondered how they are
> doing it in the big leagues.
>
> ?
>
> PJ
>
>
>>
>> I believe the issue with the current is that the random effects are
>> unlisted which essentially 'stacks' the vector, so you have the random
>> effects one at a time by effect, not by level (i.e., all intercepts,
>> then all slopes). Later this is post multiplied by a matrix that is
>> sorted by level, the dimensions are correct so the matrices multiply
>> fine, but the results are not what I believe was intended.
>>
>> I cannot see a way to add a patch to R-forge or submit a bug report so
>> I cced Doug on this.
>>
>> Cheers,
>>
>> Josh
>>
>> On Thu, Jun 21, 2012 at 8:28 PM, Paul Johnson <pauljohn32 at gmail.com> wrote:
>>> Greetings.
>>>
>>> I'm getting some crazy results out of predict on an lmer model. I've
>>> gone through
>>> this lots of times, and feel certain I'm missing some obvious mistake.
>>> If you browse
>>> through and get to the bottom, there's some output that shows my predict output
>>> is just wrong, but I can't see why.
>>>
>>> I will send the first person with the right answer a new KU t-shirt in
>>> the size of
>>> your choice. That's better than "thanks in advance". Isn't it?
>>>
>>> ## Paul Johnson <pauljohn at ku.edu>
>>> ## 2012-06-21
>>>
>>> ## I want to compare the "separate ols regressions on clusters"
>>> ## results with the mixed model estimates. I've run into
>>> ## trouble because the output of predict with a newdata object
>>> ## seems to be completely wrong. I demonstrate "manual" calculations
>>> ## to try to convince you I understand what's going on (but
>>> ## peculiarity of predict output may make things appear otherwise).
>>>
>>> ## I pasted in some output at the bottom in case you just want to
>>> ## see where I think there is a problem.
>>>
>>> ## My system:
>>> ## > sessionInfo()
>>> ## R version 2.15.0 (2012-03-30)
>>> ## Platform: x86_64-pc-linux-gnu (64-bit)
>>>
>>> ## locale:
>>> ## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>>> ## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>>> ## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>>> ## [7] LC_PAPER=C LC_NAME=C
>>> ## [9] LC_ADDRESS=C LC_TELEPHONE=C
>>> ## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> ## attached base packages:
>>> ## [1] stats graphics grDevices utils datasets methods base
>>>
>>> ## other attached packages:
>>> ## [1] lme4_0.999902344-0 Matrix_1.0-6 lattice_0.20-6 MASS_7.3-18
>>> ## loaded via a namespace (and not attached):
>>> ## [1] compiler_2.15.0 grid_2.15.0 minqa_1.2.1 nlme_3.1-104
>>> ## [5] splines_2.15.0 tools_2.15.0
>>>
>>>
>>> ## Topic: create multi-level data and use regression and lmer to
>>> ## estimate it. This creates 3 different dependent variables,
>>> ## y1: ordinary "homoskedastic" regression (no grouping effect)
>>> ## y2: clustered "random intercepts"
>>> ## y3: clustered "random intercepts" and "random slope" for 1 predictor (x1)
>>>
>>> library(MASS)
>>> set.seed(1234)
>>>
>>> ## Step 1. Create a Data Frame.
>>>
>>> ## M respondents, N observations for each one.
>>> ## In a repeated measures context, this is called "longitudinal data".
>>> ## In cross sectional approach, this is M groups (classrooms) with
>>> ## several people in each)
>>>
>>> M <- 10
>>> N <- 30
>>>
>>> ## get M unique gray colors, will use later for plotting
>>> grays <- gray.colors(M)
>>>
>>> ## Standard deviation of error term at individual level
>>> STDE <- 48
>>>
>>> ## STDEM: standard deviation of clustered intercepts.
>>> ### In a longitudinal "repeated measures" exercise, this is an
>>> ## individual-level effect. In a cross section, this is a random
>>> ## classroom effect.
>>> STDEM <- 30
>>>
>>> ## STEx1: standard deviation of slopes across cluster units
>>> STDEx1 <- 5
>>>
>>> ## The true effects of b0, b1, b2, b3 in
>>> ## y = b0 + b1*x1 + b2*x2 + b3*x3
>>> bslopes <- c(0.2, 15.4, -0.2, 3)
>>>
>>> ## Now generate the data frame with x1, x2 and x3.
>>> ## Let's suppose the predictors are Multivariate Normal, with
>>> ## means (100,200,150), standard deviations (10, 20, 30), and
>>> ## intercorrelations of 0.4. These can, of course, be adjusted
>>> ## for interesting variations.
>>>
>>> ## Mind, an "indicator" that says with which cluster an observation belongs.
>>> Mind <- 1:M %x% rep(1,N)
>>>
>>> means <- c(100, 200, 150)
>>> sds <- c(10, 20, 30)
>>> rho <- 0.4
>>> corr.mat <- matrix(c(1, rho, rho, rho, 1, rho, rho, rho, 1), nrow = 3)
>>> sigma <- diag(sds) %*% corr.mat %*% diag(sds)
>>> x.mat <- mvrnorm(n = N * M, mu = means, Sigma = sigma)
>>> dimnames(x.mat)[[2]] <- c("x1", "x2", "x3")
>>> ## Add an intercept on the front of x.mat
>>> x.mat <- cbind("(Intercept)"= 1, x.mat)
>>>
>>>
>>> ## Create a dependent variable that has no clustering effects. This
>>> ## is an "ordinary" regression with slopes and random error designated
>>> ## above random noise
>>> y1 <- x.mat %*% bslopes + rnorm(M*N, m=0, s= STDE)
>>> dat <- data.frame(id=1:(N*M), Mind, y1, as.data.frame(x.mat))
>>> rm(Mind, y1, x.mat) ## cleanup workspace
>>>
>>> ## Layer on additive group level error in y2
>>> ## Add a group-level intercept.
>>> ## A vector of M disturbances is created, and each is
>>> ## added to all N case within its cluster
>>> dat$y2 <- dat$y1 + rnorm(M, 0, STDEM) %x% rep(1, N)
>>>
>>> ## In y3, add in random slope effect
>>> ## are changing the slope for x1 from b1 to N(b1, STDEx1^2).
>>> ## Same as newb1 ~ b1 + N(0, STDEx1) * x1
>>> dat$y3 <- dat$y2 + (rnorm(M, 0, STDEx1) %x% rep(1, N)) * dat$x1
>>>
>>> ## In the lme4 package, there is an "easy" tool to run M separate lm regressions
>>> library(lme4)
>>>
>>> ##100 separate regressions, with 3 predictors
>>> m3list <- lmList(y3 ~ x1 + x2 + x3 | Mind, data=dat, pool = FALSE)
>>>
>>>
>>> ## Now I see a pattern. I'll use the unique range on x1 for each cluster
>>> ## I think that makes the lines look more "in" the data.
>>> plot(y3 ~ x1, data=dat, col=grays[Mind], main = "lm on clusters")
>>> for( i in seq_along(m3list)){
>>> m3mf <- model.frame(m3list[[i]]) #data set for group i
>>> x1range <- range(m3mf$x1) ## use group-specific ranges this time
>>> pgroup <- predict( m3list[[i]], newdata = data.frame(x1=x1range,
>>> x2=mean(m3mf$x2), x3=mean(m3mf$x3)))
>>> lines(x1range, pgroup, col=grays[i])
>>> }
>>>
>>> ## Hmm. That data looks like it might be handy for comparison later on.
>>> ## Build a data frame of it. I've never thought of doing this work
>>> ## cluster-by-cluster before.
>>>
>>> m3newdat <- lapply(m3list, function(x) {
>>> m3mf <- model.frame(x)
>>> ndf = data.frame(x1=range(m3mf$x1), x2=mean(m3mf$x2), x3=mean(m3mf$x3))
>>> ndf$m3pred <- predict(x, newdata = ndf)
>>> ndf} )
>>> ## Smash the list of data frames together
>>> m3newdat <- do.call("rbind", m3newdat)
>>> ## Interesting!
>>> m3newdat[1:20, ]
>>> ## Better add a variable Mind. This looks stupid, but works.
>>> m3newdat$Mind <- as.integer( do.call("rbind",
>>> strsplit(row.names(m3newdat), split="\\.", perl=T))[ ,1] )
>>>
>>>
>>> ## Draw new graphs on a new device, so we can compare
>>> dev.new()
>>>
>>> ## The "right" model allows the random effects to be correlated.
>>> mm4 <- lmer( y3 ~ x1 + x2 + x3 + (x1 | Mind), data=dat)
>>> summary(mm4)
>>>
>>>
>>>
>>> mm4ranef <- ranef(mm4, postVar = TRUE) ## a ranef.mer object, a list
>>> that includes one data frame
>>>
>>>
>>> ### What do the predicted values mean from lmer?
>>> ### here they are. But what do they include?
>>> head(mm4pred <- predict(mm4))
>>> tail(mm4pred <- predict(mm4))
>>>
>>>
>>> ## I can't tell what those numbers are. Can't be sure from ?predict.merMod
>>> ## So I'm going to go through the step-by-step
>>> ## process of calculating predicted values. These should follow a
>>> formula like this
>>> ## for model 4. j is the "cluster" index.
>>> ##
>>> ## j'th intercept j'th slope
>>> ## (b0 + raneff(intercept, j)) + (b1 + raneff(x1,j))*x1 + b2 x2 + b3 x3
>>>
>>> mm4b <- coef(summary(mm4))[ ,1]
>>> mm4vnames <- names(mm4b)
>>> mm4mm <- model.matrix(mm4) ## predictors
>>>
>>> ## Doing this bit by bit. First
>>> ## b0 + raneff(intercept, j)
>>> mm4inteffect <- mm4b["(Intercept)"] + mm4ranef[[1]][dat$Mind, 1]
>>> mm4x1effect <- mm4mm[ , c("x1")] * (mm4b["x1"] + mm4ranef[[1]][dat$Mind, 2])
>>> mm4pred2 <- mm4inteffect + mm4x1effect + mm4mm[ ,c("x2","x3") ] %*%
>>> mm4b[c("x2","x3")]
>>> head(mm4pred2)
>>> tail(mm4pred2)
>>>
>>> ## Aha! Those exactly match predict for mm4. So I understand what "predict" does
>>>
>>>
>>>
>>>
>>> ## Now, I want to run predict for some particular values of x1, x2, x3.
>>>
>>> ## Let's try running m3newdat through the predict function for mm4.
>>> m3newdat$mm4.3 <- predict(mm4, newdata = m3newdat)
>>> m3newdat
>>>
>>>
>>> ## Disaster. Predicted values completely out of whack. Lots of
>>> ## lines fall off the bottom of the graph.
>>>
>>> plot(y3 ~ x1, data=dat, col=grays[Mind], main="lmer mixed model predictions")
>>> by(m3newdat, m3newdat$Mind, function(x) { lines(x$x1, x$mm4.3,
>>> col=grays[x$Mind])})
>>>
>>>
>>> ## They don't match the manual calculations I perform here
>>> mm4b <- coef(summary(mm4))[ ,1]
>>> mm4vnames <- names(mm4b)
>>> mm4mm <- as.matrix(m3newdat)
>>>
>>> ## b0 + raneff(intercept, j)
>>> mm4inteffect <- mm4b["(Intercept)"] + rep(mm4ranef[[1]][, 1], each=2)
>>> mm4x1effect <- mm4mm[ , c("x1")] * rep(mm4b["x1"] + mm4ranef[[1]][, 2], each=2)
>>> mm4manualpred2 <- mm4inteffect + mm4x1effect + mm4mm[ ,c("x2","x3") ]
>>> %*% mm4b[c("x2","x3")]
>>>
>>> mm4mm <- cbind(as.data.frame(mm4mm), "mm4manualpred" = mm4manualpred2)
>>> mm4mm
>>>
>>> ## Here's what I see when I do that. m3pred: the lmList predictions.
>>> ## mm4manualpred: my "manual" calculations of predicted values for lmer mm4
>>> ## mm4.3 output from predict for same "newdat" used with mm4manualpred.
>>>
>>> ## > mm4mm
>>> ## x1 x2 x3 m3pred Mind mm4.3 mm4manualpred
>>> ## 1.1 88.51074 196.0615 141.3859 1221.477 1 -434.7602 1239.152
>>> ## 1.2 110.88258 196.0615 141.3859 1483.957 1 -630.0463 1466.961
>>> ## 2.1 80.26167 194.1252 133.7265 1405.672 2 -614.3256 1388.751
>>> ## 2.2 117.08533 194.1252 133.7265 1845.499 2 -1051.4105 1866.505
>>> ## 3.1 80.84091 202.4517 154.3976 1302.205 3 3527.8154 1284.706
>>> ## 3.2 120.33775 202.4517 154.3976 1692.819 3 5037.6392 1717.210
>>> ## 4.1 68.96320 204.5145 151.6662 1321.927 4 5097.5261 1379.251
>>> ## 4.2 120.17029 204.5145 151.6662 2137.589 4 8559.2362 2108.628
>>> ## 5.1 84.85202 200.7799 153.1595 1821.135 5 4736.2240 1813.415
>>> ## 5.2 122.58730 200.7799 153.1595 2418.381 5 6641.2026 2426.855
>>> ## 6.1 84.60214 202.3324 153.2583 2015.183 6 1521.6831 2001.700
>>> ## 6.2 128.13418 202.3324 153.2583 2778.684 6 2086.4726 2799.053
>>> ## 7.1 74.73548 198.5761 155.8984 1780.196 7 1497.6761 1802.381
>>> ## 7.2 118.97636 198.5761 155.8984 2622.083 7 2127.8282 2603.626
>>> ## 8.1 83.40882 200.4991 152.0515 2342.043 8 1954.4592 2323.477
>>> ## 8.2 118.73997 200.4991 152.0515 3088.686 8 2601.6007 3105.182
>>> ## 9.1 82.48090 205.4736 152.2774 1979.852 9 2253.5650 1950.388
>>> ## 9.2 117.48279 205.4736 152.2774 2554.581 9 3027.9842 2587.319
>>> ## 10.1 84.77764 205.9689 152.3325 2256.434 10 2236.3346 2268.620
>>> ## 10.2 119.66417 205.9689 152.3325 3022.053 10 2980.1202 3012.406
>>>
>>>
>>> --
>>> Paul E. Johnson
>>> Professor, Political Science Assoc. Director
>>> 1541 Lilac Lane, Room 504 Center for Research Methods
>>> University of Kansas University of Kansas
>>> http://pj.freefaculty.org http://quant.ku.edu
>>>
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>>
>>
>> --
>> Joshua Wiley
>> Ph.D. Student, Health Psychology
>> Programmer Analyst II, Statistical Consulting Group
>> University of California, Los Angeles
>> https://joshuawiley.com/
>
>
>
> --
> Paul E. Johnson
> Professor, Political Science Assoc. Director
> 1541 Lilac Lane, Room 504 Center for Research Methods
> University of Kansas University of Kansas
> http://pj.freefaculty.org http://quant.ku.edu
--
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/
More information about the R-sig-mixed-models
mailing list