[R-sig-ME] alternative prediction methods for MCMCglmm
Paul Debes
paul.debes at utu.fi
Sun Jan 31 10:15:50 CET 2016
Hi Jonathan,
This problem reminds me of a problem I run into when I predict random
effect trends with asreml-R to check if I actually model the right things:
I also run into memory limitations. What I do in these cases is predicting
piecewise for the random levels, with each prediction small enough to be
handled by the memory and combining the results afterwards. Is that an
option for you?
Another possibility may be using the "lsmeans" package. It is also able to
predict effect combinations for MCMCglmm models. I think you may have to
provide additional information to the function to work compared to most
other package models, but this should be described in the manual.
Hope this helps,
Paul
On Sun, 31 Jan 2016 07:04:24 +0200, Jonathan Judge <bachlaw01 at outlook.com>
wrote:
> I have a data set with 190k observations, over 100 predictors (including
> interactions), and 4 groups, each ranging from 100 to 1000 members.
>
> Lme4 models it in about 3 minutes, but lme4 is of course fast at most
> such things because it is able to work from maximum likelihood.
>
> What I'd prefer to use is MCMC. Stan is overwhelmed by such a model,
> but MCMCglmm, to its credit, can model its standard 13000 iterations in
> about 2 hours. (I suspect MCMCglmm's block sampling is well suited to
> working with large groups). The sampling is relatively stable after the
> 3000 burnin iterations, so I have set thin=1 for the last 10000
> iterations to reach effective sample sizes in a reasonable amount of
> time. This gives me 10000 saved iterations over the 190k observations.
>
> The problem is that while the model runs, the data is saved, and the
> parameter values from the model summary look good, the amount of saved
> data overwhelms the built-in predict function when I try to use
> posterior = 'all'. Whether I use a machine with 8 or 64 GB of RAM, on
> Windows 10 or Linux, a prediction from this model uses up virtually all
> of the available memory and then pronounces that it cannot fit a vector
> of ___ size (it varies). Sometimes the model refuses to do predictions
> at all; sometimes it will do predictions but chokes on doing intervals
> along with predictions, and sometimes it will do predictions with no
> marginalizations but object when I attempt to marginalize only certain
> groups. There is no problem if I marginalize off a particular point
> (mean, median, etc.) but for me that defeats much of the point of using
> (and waiting for) MCMC.
>
> In reviewing the traceback on the failed predictions, it seems like the
> process is choking on the calls the prediction function makes to coda
> and its calculation of the HDR, as called through the apply function. I
> recall Jerrod saying that he considered the predict function to be
> somewhat crude, and wonder therefore if anyone can suggest either (1)
> revisions to the predict function that might cut down on memory usage,
> or (2) alternative / manual ways to marginalize over the predictors in
> the MCMCglmm fit that would not leak / absorb so much memory. I doubt
> that I am the first person to face this challenge.
>
> A dummy set of data is provided below that will substantially replicate
> the problem. I am using the version of lme4 and MCMCglmm 2.22.1, R
> 3.2.3.
>
> Thanks very much.
>
> Jonathan
>
> ***********************************************************
>
> ### Prepare Environment ###
>
> rm(list=ls(all=TRUE))
>
> set.seed(1234)
>
> ### Create vectors and df ###
>
> y <- scale(c(rep(0,125000), rep(.4,5000), rep(.65, 45000),
> rep(1.1,10000), rep(1.6, 5000)), center=T, scale=T)
> b <- as.factor(c(rep(1,100000), rep(2,90000)))
> c <- as.factor(c(rep_len(1:30, 190000)))
> d <- as.factor(c(rep(1,60000), rep(2,50000), rep(3, 40000),
> rep(4,40000)))
> e <- scale(rnorm(190000,0,1), center=T, scale=T)
> f <- scale(rnorm(190000,0,1), center=T, scale=T)
> g <- scale(log(rpois(190000,100)), center=T, scale=T)
> h <- as.factor(c(rep(1,25000), rep(2,22000), rep(3,26000),
> rep(4,30000), rep(5,17000), rep(6,15000), rep(7,30000), rep(8,25000)))
> j <- as.factor(rpois(190000,20000))
> k <- as.factor(rpois(190000,7500))
> m <- as.factor(rpois(190000,150))
>
> df <- data.frame(y,b,c,d,e,f,g,h,j,k,m)
>
> ### lme4 fit ###
>
> require(lme4)
> fit.lme4 <- lmer(y ~ b*c +
> d*e +
> f +
> g +
> e +
> h +
> (1|j) +
> (1|k) +
> (1|m),
> data=df)
>
> ### MCMCglmm fit ###
> require(MCMCglmm)
> fit.MCMCglmm <- MCMCglmm(y ~ b*c +
> d*e +
> f +
> g +
> e +
> h,
> random =
> ~j +
> k +
> m,
> data=df,
> pr=T,
> thin=1)
>
> ******************************************************
>
> In terms of error codes, they are not always the same, but here is the
> traceback from an attempt to predict the response with a confidence
> interval:
>
>
>> df$preds <- predict(fit.MCMCglmm, type='response',
>> interval='confidence', posterior='all')
> There were 14 warnings (use warnings() to see them)
> Error in t(apply(object$Sol, 1, function(x) { :
> error in evaluating the argument 'x' in selecting a method for
> function 't': Error: cannot allocate vector of size 14.2 Gb
>> traceback()
> 3: t(apply(object$Sol, 1, function(x) {
> (W %*% x)@x
> }))
> 2: predict.MCMCglmm(fit.MCMCglmm, type = "response", interval =
> "confidence",
> posterior = "all")
> 1: predict(fit.MCMCglmm, type = "response", interval = "confidence",
> posterior = "all")
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
--
Paul V. Debes
DFG Research Fellow
University of Turku
Department of Biology
Finland
Email: paul.debes at utu.fi
More information about the R-sig-mixed-models
mailing list