[R-sig-ME] Dealing with large datasets in lme/lmer

Wed Sep 5 21:11:28 CEST 2007

On 9/4/07, Gang Chen <gangchen at mail.nih.gov> wrote:
> Dear all,

> I'm running mixed-effects analysis with large datasets in a loop like
> this:

> for (i in 1:60) {
> for (j in 1:60) {
> for (k in 1:60) {
>     [...update y here in Model here...]
>     fit.lme <- lme(y ~ FA*FB*FC+weight, random = pdBlocked(list
> (pdCompSymm(~FB-1), pdCompSymm(~FC-1), pdIdent(~1))), weights=varIdent
> (form=~1|FA), Model);
>     Stat[i, j, k,] <- anova(fit.lme)$F[-1];
> }
> }
> }

Did you create the array Stat outside the loop?  If not you will be
doing a lot of copying of elements of that array.

> This takes a little over 100 hours to finish on a Mac G5 with duo
> 2.7GHz processors and 4GB memory.

> In the mixed-effects model

> y = X*beta + Z*b + e

> the fixed-effects nxp matrix X and random-effects matrix nxq Z are
> always the same for all the iterations in my case, and the only thing
> that differs is y (and the estimates of beta, b and e also differ of
> course). In my case n = 504 (large), p and q are moderate.  I just
> read Dr. Douglas Bates's presentation during uerR! 2007 (very
> informative by the way):

Thank you.

> http://user2007.org/program/presentations/bates.pdf

> It seems many components in the extended system matrix (equation (2)
> on page 22) for the Cholesky decomposition remain the same during the
> iterations. So there are a lot of repetitive computations on those
> same matrix operations in the above loop. How can I achieve a better
> efficiency? Someone suggested to me running lme/lmer with a two-
> dimensional response Y instead of one-dimensional y. My questions are:

> (1) So far I have only seen people running lme/lmer with y in a
> format of one-dimensional array from a table. If I combine all those
> y's (indices i, j, k) into an two-dimensional array Y, is there a way
> I can run lme/lmer on Y instead of y? In other words, does lme/lmer
> take a two-dimensional array Y?

Not at present.

> If so, do I have to save the huge
> array in a table in text file and then read in R before I run lme/
> lmer?

No.  There are many ways of getting data into R other than creating a
text file and reading it.  See the manual "R Data Import/Export" and
also Martin Maechler's presentation at useR!2007.
http://user2007.org/program/presentations/maechler.pdf

> Also if that is the case, how can I label those many columns
> somehow associated with Y?

> (2) A more serious concern is about memory. With the current looping
> approach it uses about 1GB. If I could possibly go with the matrix
> method described in (1), I'm worried that it might not be practically
> feasible with the current computers. Any thoughts?

Well first you are discussing the computational methods used in lmer
but you want to fit a model with different residual variances for
different groups.  At present you can't do that in lmer.

If you look at the lmer function in the development version of the
lme4 package (currently at
https://svn.r-project.org/R-packages/branches/gappy-lmer, soon to be
at http://r-forge.r-project.org/projects/lme4 for some value of
"soon") you will see that it follows the equations in my useR
presentation fairly closely.  The Xy array is n by (p + 1) with X in
the first p columns and y in the p + 1st column.  The object of class
"lmer" has slots named y, Zt (Z-transpose), ZtXy (Zt %*% Xy), and
XytXy (crossprod(Xy)). After fitting the model to the first simulated
response, producing the object 'fm',  the only operations needed to
update the model are

 fm at y <- newy
 Xy <- cbind(fm at X, fm at y)
 fm at ZtXy <- fm at Zt %*% Xy
 fm at XytXy <- crossprod(Xy)
 lme4:::mer_finalize(fm, verbose)

where 'verbose' is a logical scalar indicating if you want verbose
output during the optimization phase.  Once you get things working on
a small example you would probably want to turn that off.

Please note that this code applies to the development version of the
lme4 package.