[R-sig-ME] Dealing with large datasets in lme/lmer

Mon Sep 10 21:59:38 CEST 2007

Dr. Bates,

Thank you very much for the quick response and suggestions.

I really want to try out the lmer development package, but could not  
figure out how to download it. I tried to run

wget https://svn.r-project.org/R-packages/branches/gappy-lmer/
(Or wget --no-check-certificate -r -nv -m -np -nH https://svn.r- 
project.org/R-packages/branches/gappy-lmer/ )

but it didn't work. So how can I obtain the package? Do I only need  
the files under

https://svn.r-project.org/R-packages/branches/gappy-lmer/R/

or do I need to compile/build the package with the source code somehow?

Thanks,
Gang

=========
Gang Chen, Ph. D.
National Institutes of Health, HHS

On Sep 5, 2007, at 3:11 PM, Douglas Bates wrote:

> On 9/4/07, Gang Chen <gangchen at mail.nih.gov> wrote:
>> Dear all,
>
>> I'm running mixed-effects analysis with large datasets in a loop like
>> this:
>
>> for (i in 1:60) {
>> for (j in 1:60) {
>> for (k in 1:60) {
>>     [...update y here in Model here...]
>>     fit.lme <- lme(y ~ FA*FB*FC+weight, random = pdBlocked(list
>> (pdCompSymm(~FB-1), pdCompSymm(~FC-1), pdIdent(~1))),  
>> weights=varIdent
>> (form=~1|FA), Model);
>>     Stat[i, j, k,] <- anova(fit.lme)$F[-1];
>> }
>> }
>> }
>
> Did you create the array Stat outside the loop?  If not you will be
> doing a lot of copying of elements of that array.
>
>> This takes a little over 100 hours to finish on a Mac G5 with duo
>> 2.7GHz processors and 4GB memory.
>
>> In the mixed-effects model
>
>> y = X*beta + Z*b + e
>
>> the fixed-effects nxp matrix X and random-effects matrix nxq Z are
>> always the same for all the iterations in my case, and the only thing
>> that differs is y (and the estimates of beta, b and e also differ of
>> course). In my case n = 504 (large), p and q are moderate.  I just
>> read Dr. Douglas Bates's presentation during uerR! 2007 (very
>> informative by the way):
>
> Thank you.
>
>> http://user2007.org/program/presentations/bates.pdf
>
>> It seems many components in the extended system matrix (equation (2)
>> on page 22) for the Cholesky decomposition remain the same during the
>> iterations. So there are a lot of repetitive computations on those
>> same matrix operations in the above loop. How can I achieve a better
>> efficiency? Someone suggested to me running lme/lmer with a two-
>> dimensional response Y instead of one-dimensional y. My questions  
>> are:
>
>> (1) So far I have only seen people running lme/lmer with y in a
>> format of one-dimensional array from a table. If I combine all those
>> y's (indices i, j, k) into an two-dimensional array Y, is there a way
>> I can run lme/lmer on Y instead of y? In other words, does lme/lmer
>> take a two-dimensional array Y?
>
> Not at present.
>
>> If so, do I have to save the huge
>> array in a table in text file and then read in R before I run lme/
>> lmer?
>
> No.  There are many ways of getting data into R other than creating a
> text file and reading it.  See the manual "R Data Import/Export" and
> also Martin Maechler's presentation at useR!2007.
> http://user2007.org/program/presentations/maechler.pdf
>
>> Also if that is the case, how can I label those many columns
>> somehow associated with Y?
>
>> (2) A more serious concern is about memory. With the current looping
>> approach it uses about 1GB. If I could possibly go with the matrix
>> method described in (1), I'm worried that it might not be practically
>> feasible with the current computers. Any thoughts?
>
> Well first you are discussing the computational methods used in lmer
> but you want to fit a model with different residual variances for
> different groups.  At present you can't do that in lmer.
>
> If you look at the lmer function in the development version of the
> lme4 package (currently at
> https://svn.r-project.org/R-packages/branches/gappy-lmer, soon to be
> at http://r-forge.r-project.org/projects/lme4 for some value of
> "soon") you will see that it follows the equations in my useR
> presentation fairly closely.  The Xy array is n by (p + 1) with X in
> the first p columns and y in the p + 1st column.  The object of class
> "lmer" has slots named y, Zt (Z-transpose), ZtXy (Zt %*% Xy), and
> XytXy (crossprod(Xy)). After fitting the model to the first simulated
> response, producing the object 'fm',  the only operations needed to
> update the model are
>
>  fm at y <- newy
>  Xy <- cbind(fm at X, fm at y)
>  fm at ZtXy <- fm at Zt %*% Xy
>  fm at XytXy <- crossprod(Xy)
>  lme4:::mer_finalize(fm, verbose)
>
> where 'verbose' is a logical scalar indicating if you want verbose
> output during the optimization phase.  Once you get things working on
> a small example you would probably want to turn that off.
>
> Please note that this code applies to the development version of the
> lme4 package.