[R-sig-ME] Dealing with large datasets in lme/lmer
Gang Chen
gangchen at mail.nih.gov
Mon Sep 10 21:59:38 CEST 2007
Dr. Bates,
Thank you very much for the quick response and suggestions.
I really want to try out the lmer development package, but could not
figure out how to download it. I tried to run
wget https://svn.r-project.org/R-packages/branches/gappy-lmer/
(Or wget --no-check-certificate -r -nv -m -np -nH https://svn.r-
project.org/R-packages/branches/gappy-lmer/ )
but it didn't work. So how can I obtain the package? Do I only need
the files under
https://svn.r-project.org/R-packages/branches/gappy-lmer/R/
or do I need to compile/build the package with the source code somehow?
Thanks,
Gang
=========
Gang Chen, Ph. D.
National Institutes of Health, HHS
On Sep 5, 2007, at 3:11 PM, Douglas Bates wrote:
> On 9/4/07, Gang Chen <gangchen at mail.nih.gov> wrote:
>> Dear all,
>
>> I'm running mixed-effects analysis with large datasets in a loop like
>> this:
>
>> for (i in 1:60) {
>> for (j in 1:60) {
>> for (k in 1:60) {
>> [...update y here in Model here...]
>> fit.lme <- lme(y ~ FA*FB*FC+weight, random = pdBlocked(list
>> (pdCompSymm(~FB-1), pdCompSymm(~FC-1), pdIdent(~1))),
>> weights=varIdent
>> (form=~1|FA), Model);
>> Stat[i, j, k,] <- anova(fit.lme)$F[-1];
>> }
>> }
>> }
>
> Did you create the array Stat outside the loop? If not you will be
> doing a lot of copying of elements of that array.
>
>> This takes a little over 100 hours to finish on a Mac G5 with duo
>> 2.7GHz processors and 4GB memory.
>
>> In the mixed-effects model
>
>> y = X*beta + Z*b + e
>
>> the fixed-effects nxp matrix X and random-effects matrix nxq Z are
>> always the same for all the iterations in my case, and the only thing
>> that differs is y (and the estimates of beta, b and e also differ of
>> course). In my case n = 504 (large), p and q are moderate. I just
>> read Dr. Douglas Bates's presentation during uerR! 2007 (very
>> informative by the way):
>
> Thank you.
>
>> http://user2007.org/program/presentations/bates.pdf
>
>> It seems many components in the extended system matrix (equation (2)
>> on page 22) for the Cholesky decomposition remain the same during the
>> iterations. So there are a lot of repetitive computations on those
>> same matrix operations in the above loop. How can I achieve a better
>> efficiency? Someone suggested to me running lme/lmer with a two-
>> dimensional response Y instead of one-dimensional y. My questions
>> are:
>
>> (1) So far I have only seen people running lme/lmer with y in a
>> format of one-dimensional array from a table. If I combine all those
>> y's (indices i, j, k) into an two-dimensional array Y, is there a way
>> I can run lme/lmer on Y instead of y? In other words, does lme/lmer
>> take a two-dimensional array Y?
>
> Not at present.
>
>> If so, do I have to save the huge
>> array in a table in text file and then read in R before I run lme/
>> lmer?
>
> No. There are many ways of getting data into R other than creating a
> text file and reading it. See the manual "R Data Import/Export" and
> also Martin Maechler's presentation at useR!2007.
> http://user2007.org/program/presentations/maechler.pdf
>
>> Also if that is the case, how can I label those many columns
>> somehow associated with Y?
>
>> (2) A more serious concern is about memory. With the current looping
>> approach it uses about 1GB. If I could possibly go with the matrix
>> method described in (1), I'm worried that it might not be practically
>> feasible with the current computers. Any thoughts?
>
> Well first you are discussing the computational methods used in lmer
> but you want to fit a model with different residual variances for
> different groups. At present you can't do that in lmer.
>
> If you look at the lmer function in the development version of the
> lme4 package (currently at
> https://svn.r-project.org/R-packages/branches/gappy-lmer, soon to be
> at http://r-forge.r-project.org/projects/lme4 for some value of
> "soon") you will see that it follows the equations in my useR
> presentation fairly closely. The Xy array is n by (p + 1) with X in
> the first p columns and y in the p + 1st column. The object of class
> "lmer" has slots named y, Zt (Z-transpose), ZtXy (Zt %*% Xy), and
> XytXy (crossprod(Xy)). After fitting the model to the first simulated
> response, producing the object 'fm', the only operations needed to
> update the model are
>
> fm at y <- newy
> Xy <- cbind(fm at X, fm at y)
> fm at ZtXy <- fm at Zt %*% Xy
> fm at XytXy <- crossprod(Xy)
> lme4:::mer_finalize(fm, verbose)
>
> where 'verbose' is a logical scalar indicating if you want verbose
> output during the optimization phase. Once you get things working on
> a small example you would probably want to turn that off.
>
> Please note that this code applies to the development version of the
> lme4 package.
More information about the R-sig-mixed-models
mailing list