[R-sig-ME] Dealing with large datasets in lme/lmer

Mon Sep 10 22:26:56 CEST 2007

The best way to obtain the package sources is with a subversion
client.  On Linux it is called svn.  The call to check out a copy of
the package sources is

svn co https://svn.r-project.org/R-packages/branches/gappy-lmer ./lme4

You need the whole directory to build the package.  Under Linux or Mac
OS X I use

R CMD INSTALL ./lme4

For Windows I use Uwe's win-builder at win-builder.R-project.org

On 9/10/07, Gang Chen <gangchen at mail.nih.gov> wrote:
> Dr. Bates,
>
> Thank you very much for the quick response and suggestions.
>
> I really want to try out the lmer development package, but could not
> figure out how to download it. I tried to run
>
> wget https://svn.r-project.org/R-packages/branches/gappy-lmer/
> (Or wget --no-check-certificate -r -nv -m -np -nH https://svn.r-
> project.org/R-packages/branches/gappy-lmer/ )
>
> but it didn't work. So how can I obtain the package? Do I only need
> the files under
>
> https://svn.r-project.org/R-packages/branches/gappy-lmer/R/
>
> or do I need to compile/build the package with the source code somehow?
>
> Thanks,
> Gang
>
> =========
> Gang Chen, Ph. D.
> National Institutes of Health, HHS
>
>
> On Sep 5, 2007, at 3:11 PM, Douglas Bates wrote:
>
> > On 9/4/07, Gang Chen <gangchen at mail.nih.gov> wrote:
> >> Dear all,
> >
> >> I'm running mixed-effects analysis with large datasets in a loop like
> >> this:
> >
> >> for (i in 1:60) {
> >> for (j in 1:60) {
> >> for (k in 1:60) {
> >>     [...update y here in Model here...]
> >>     fit.lme <- lme(y ~ FA*FB*FC+weight, random = pdBlocked(list
> >> (pdCompSymm(~FB-1), pdCompSymm(~FC-1), pdIdent(~1))),
> >> weights=varIdent
> >> (form=~1|FA), Model);
> >>     Stat[i, j, k,] <- anova(fit.lme)$F[-1];
> >> }
> >> }
> >> }
> >
> > Did you create the array Stat outside the loop?  If not you will be
> > doing a lot of copying of elements of that array.
> >
> >> This takes a little over 100 hours to finish on a Mac G5 with duo
> >> 2.7GHz processors and 4GB memory.
> >
> >> In the mixed-effects model
> >
> >> y = X*beta + Z*b + e
> >
> >> the fixed-effects nxp matrix X and random-effects matrix nxq Z are
> >> always the same for all the iterations in my case, and the only thing
> >> that differs is y (and the estimates of beta, b and e also differ of
> >> course). In my case n = 504 (large), p and q are moderate.  I just
> >> read Dr. Douglas Bates's presentation during uerR! 2007 (very
> >> informative by the way):
> >
> > Thank you.
> >
> >> http://user2007.org/program/presentations/bates.pdf
> >
> >> It seems many components in the extended system matrix (equation (2)
> >> on page 22) for the Cholesky decomposition remain the same during the
> >> iterations. So there are a lot of repetitive computations on those
> >> same matrix operations in the above loop. How can I achieve a better
> >> efficiency? Someone suggested to me running lme/lmer with a two-
> >> dimensional response Y instead of one-dimensional y. My questions
> >> are:
> >
> >> (1) So far I have only seen people running lme/lmer with y in a
> >> format of one-dimensional array from a table. If I combine all those
> >> y's (indices i, j, k) into an two-dimensional array Y, is there a way
> >> I can run lme/lmer on Y instead of y? In other words, does lme/lmer
> >> take a two-dimensional array Y?
> >
> > Not at present.
> >
> >> If so, do I have to save the huge
> >> array in a table in text file and then read in R before I run lme/
> >> lmer?
> >
> > No.  There are many ways of getting data into R other than creating a
> > text file and reading it.  See the manual "R Data Import/Export" and
> > also Martin Maechler's presentation at useR!2007.
> > http://user2007.org/program/presentations/maechler.pdf
> >
> >> Also if that is the case, how can I label those many columns
> >> somehow associated with Y?
> >
> >> (2) A more serious concern is about memory. With the current looping
> >> approach it uses about 1GB. If I could possibly go with the matrix
> >> method described in (1), I'm worried that it might not be practically
> >> feasible with the current computers. Any thoughts?
> >
> > Well first you are discussing the computational methods used in lmer
> > but you want to fit a model with different residual variances for
> > different groups.  At present you can't do that in lmer.
> >
> > If you look at the lmer function in the development version of the
> > lme4 package (currently at
> > https://svn.r-project.org/R-packages/branches/gappy-lmer, soon to be
> > at http://r-forge.r-project.org/projects/lme4 for some value of
> > "soon") you will see that it follows the equations in my useR
> > presentation fairly closely.  The Xy array is n by (p + 1) with X in
> > the first p columns and y in the p + 1st column.  The object of class
> > "lmer" has slots named y, Zt (Z-transpose), ZtXy (Zt %*% Xy), and
> > XytXy (crossprod(Xy)). After fitting the model to the first simulated
> > response, producing the object 'fm',  the only operations needed to
> > update the model are
> >
> >  fm at y <- newy
> >  Xy <- cbind(fm at X, fm at y)
> >  fm at ZtXy <- fm at Zt %*% Xy
> >  fm at XytXy <- crossprod(Xy)
> >  lme4:::mer_finalize(fm, verbose)
> >
> > where 'verbose' is a logical scalar indicating if you want verbose
> > output during the optimization phase.  Once you get things working on
> > a small example you would probably want to turn that off.
> >
> > Please note that this code applies to the development version of the
> > lme4 package.
>