[R-sig-ME] Dealing with large datasets in lme/lmer

Douglas Bates bates at stat.wisc.edu
Tue Sep 11 18:08:24 CEST 2007


On 9/11/07, Gang Chen <gangchen at mail.nih.gov> wrote:
> Again thanks a lot for the help, Dr. Bates!

> Now with the development version of lmer I got an error for the
> following model which works fine with the old lmer:

>  >fit.lmer <- lmer(y ~ FA*FB*FC+weight+(1|Subj), Model);
> Error in validObject(.Object) : invalid class "lmer" object: dims
> slot not named or incorrect length

> What went wrong?

I don't know.  I'd need the output from

sessionInfo()

to be able to even start to guess.

> Regarding compiling, it works fine on a Mac OS X 10.4 with with duo
> 2.7GHz processors, but failed on a Mac OS X 10.4.10 with a 2 GHz
> Intel Core Duo processor with the following error (does it have
> something to do with the Intel processor?):

You need the most recent version of the Matrix package installed
before you can compile the development lme4

>  >  R CMD INSTALL ./lme4
> * Installing to library '/Library/Frameworks/R.framework/Resources/
> library'
> * Installing *source* package 'lme4' ...
> ** libs
> ** arch - i386
> gcc-4.0 -arch i386 -isysroot /Developer/SDKs/MacOSX10.4u.sdk -no-cpp-
> precomp -I/Library/Frameworks/R.framework/Resources/include -I/
> Library/Frameworks/R.framework/Resources/include/i386  -msse3 -I"/
> Library/Frameworks/R.framework/Resources/library/Matrix/include" -
> D__NO_MATH_INLINES  -fPIC  -g -O2 -std=gnu99 -march=nocona -c init.c -
> o init.o
> gcc-4.0 -arch i386 -isysroot /Developer/SDKs/MacOSX10.4u.sdk -no-cpp-
> precomp -I/Library/Frameworks/R.framework/Resources/include -I/
> Library/Frameworks/R.framework/Resources/include/i386  -msse3 -I"/
> Library/Frameworks/R.framework/Resources/library/Matrix/include" -
> D__NO_MATH_INLINES  -fPIC  -g -O2 -std=gnu99 -march=nocona -c lmer.c -
> o lmer.o
> lmer.c:356:48: error: macro "N_AS_CHM_DN" passed 3 arguments, but
> takes just 2
> lmer.c: In function 'Mer_eta':
> lmer.c:356: error: 'N_AS_CHM_DN' undeclared (first use in this function)
> lmer.c:356: error: (Each undeclared identifier is reported only once
> lmer.c:356: error: for each function it appears in.)
> lmer.c:1448:40: error: macro "N_AS_CHM_DN" passed 3 arguments, but
> takes just 2
> lmer.c: In function 'nglmer_condMode':
> lmer.c:1448: error: 'N_AS_CHM_DN' undeclared (first use in this
> function)
> lmer.c:1449:30: error: macro "N_AS_CHM_DN" passed 3 arguments, but
> takes just 2
> lmer.c:1718:45: error: macro "N_AS_CHM_DN" passed 3 arguments, but
> takes just 2
> lmer.c: In function 'lmer_MCMC_betab':
> lmer.c:1718: error: 'N_AS_CHM_DN' undeclared (first use in this
> function)
> make: *** [lmer.o] Error 1
> chmod: /Library/Frameworks/R.framework/Versions/2.5/Resources/library/
> lme4/libs/i386/*: No such file or directory
> ** arch - ppc
> gcc-4.0 -arch ppc -isysroot /Developer/SDKs/MacOSX10.4u.sdk -
> std=gnu99 -no-cpp-precomp -I/Library/Frameworks/R.framework/Resources/
> include -I/Library/Frameworks/R.framework/Resources/include/ppc  -I/
> usr/local/include -I"/Library/Frameworks/R.framework/Resources/
> library/Matrix/include"   -fPIC  -g -O2 -c init.c -o init.o
> gcc-4.0 -arch ppc -isysroot /Developer/SDKs/MacOSX10.4u.sdk -
> std=gnu99 -no-cpp-precomp -I/Library/Frameworks/R.framework/Resources/
> include -I/Library/Frameworks/R.framework/Resources/include/ppc  -I/
> usr/local/include -I"/Library/Frameworks/R.framework/Resources/
> library/Matrix/include"   -fPIC  -g -O2 -c lmer.c -o lmer.o
> lmer.c:356:48: error: macro "N_AS_CHM_DN" passed 3 arguments, but
> takes just 2
> lmer.c: In function 'Mer_eta':
> lmer.c:356: error: 'N_AS_CHM_DN' undeclared (first use in this function)
> lmer.c:356: error: (Each undeclared identifier is reported only once
> lmer.c:356: error: for each function it appears in.)
> lmer.c:1448:40: error: macro "N_AS_CHM_DN" passed 3 arguments, but
> takes just 2
> lmer.c: In function 'nglmer_condMode':
> lmer.c:1448: error: 'N_AS_CHM_DN' undeclared (first use in this
> function)
> lmer.c:1449:30: error: macro "N_AS_CHM_DN" passed 3 arguments, but
> takes just 2
> lmer.c:1718:45: error: macro "N_AS_CHM_DN" passed 3 arguments, but
> takes just 2
> lmer.c: In function 'lmer_MCMC_betab':
> lmer.c:1718: error: 'N_AS_CHM_DN' undeclared (first use in this
> function)
> make: *** [lmer.o] Error 1
> chmod: /Library/Frameworks/R.framework/Versions/2.5/Resources/library/
> lme4/libs/ppc/*: No such file or directory
> ERROR: compilation failed for package 'lme4'
> ** Removing '/Library/Frameworks/R.framework/Versions/2.5/Resources/
> library/lme4'
> ** Restoring previous '/Library/Frameworks/R.framework/Versions/2.5/
> Resources/library/lme4'
>
>
> Thanks,
> Gang
>
> =========
> Gang Chen, Ph. D.
> National Institutes of Health, DHHS
>
> On Sep 10, 2007, at 4:26 PM, Douglas Bates wrote:
>
> > The best way to obtain the package sources is with a subversion
> > client.  On Linux it is called svn.  The call to check out a copy of
> > the package sources is
> >
> > svn co https://svn.r-project.org/R-packages/branches/gappy-lmer ./lme4
> >
> > You need the whole directory to build the package.  Under Linux or Mac
> > OS X I use
> >
> > R CMD INSTALL ./lme4
> >
> > For Windows I use Uwe's win-builder at win-builder.R-project.org
> >
> > On 9/10/07, Gang Chen <gangchen at mail.nih.gov> wrote:
> >> Dr. Bates,
> >>
> >> Thank you very much for the quick response and suggestions.
> >>
> >> I really want to try out the lmer development package, but could not
> >> figure out how to download it. I tried to run
> >>
> >> wget https://svn.r-project.org/R-packages/branches/gappy-lmer/
> >> (Or wget --no-check-certificate -r -nv -m -np -nH https://svn.r-
> >> project.org/R-packages/branches/gappy-lmer/ )
> >>
> >> but it didn't work. So how can I obtain the package? Do I only need
> >> the files under
> >>
> >> https://svn.r-project.org/R-packages/branches/gappy-lmer/R/
> >>
> >> or do I need to compile/build the package with the source code
> >> somehow?
> >>
> >> Thanks,
> >> Gang
> >>
> >> =========
> >> Gang Chen, Ph. D.
> >> National Institutes of Health, HHS
> >>
> >>
> >> On Sep 5, 2007, at 3:11 PM, Douglas Bates wrote:
> >>
> >>> On 9/4/07, Gang Chen <gangchen at mail.nih.gov> wrote:
> >>>> Dear all,
> >>>
> >>>> I'm running mixed-effects analysis with large datasets in a loop
> >>>> like
> >>>> this:
> >>>
> >>>> for (i in 1:60) {
> >>>> for (j in 1:60) {
> >>>> for (k in 1:60) {
> >>>>     [...update y here in Model here...]
> >>>>     fit.lme <- lme(y ~ FA*FB*FC+weight, random = pdBlocked(list
> >>>> (pdCompSymm(~FB-1), pdCompSymm(~FC-1), pdIdent(~1))),
> >>>> weights=varIdent
> >>>> (form=~1|FA), Model);
> >>>>     Stat[i, j, k,] <- anova(fit.lme)$F[-1];
> >>>> }
> >>>> }
> >>>> }
> >>>
> >>> Did you create the array Stat outside the loop?  If not you will be
> >>> doing a lot of copying of elements of that array.
> >>>
> >>>> This takes a little over 100 hours to finish on a Mac G5 with duo
> >>>> 2.7GHz processors and 4GB memory.
> >>>
> >>>> In the mixed-effects model
> >>>
> >>>> y = X*beta + Z*b + e
> >>>
> >>>> the fixed-effects nxp matrix X and random-effects matrix nxq Z are
> >>>> always the same for all the iterations in my case, and the only
> >>>> thing
> >>>> that differs is y (and the estimates of beta, b and e also
> >>>> differ of
> >>>> course). In my case n = 504 (large), p and q are moderate.  I just
> >>>> read Dr. Douglas Bates's presentation during uerR! 2007 (very
> >>>> informative by the way):
> >>>
> >>> Thank you.
> >>>
> >>>> http://user2007.org/program/presentations/bates.pdf
> >>>
> >>>> It seems many components in the extended system matrix (equation
> >>>> (2)
> >>>> on page 22) for the Cholesky decomposition remain the same
> >>>> during the
> >>>> iterations. So there are a lot of repetitive computations on those
> >>>> same matrix operations in the above loop. How can I achieve a
> >>>> better
> >>>> efficiency? Someone suggested to me running lme/lmer with a two-
> >>>> dimensional response Y instead of one-dimensional y. My questions
> >>>> are:
> >>>
> >>>> (1) So far I have only seen people running lme/lmer with y in a
> >>>> format of one-dimensional array from a table. If I combine all
> >>>> those
> >>>> y's (indices i, j, k) into an two-dimensional array Y, is there
> >>>> a way
> >>>> I can run lme/lmer on Y instead of y? In other words, does lme/lmer
> >>>> take a two-dimensional array Y?
> >>>
> >>> Not at present.
> >>>
> >>>> If so, do I have to save the huge
> >>>> array in a table in text file and then read in R before I run lme/
> >>>> lmer?
> >>>
> >>> No.  There are many ways of getting data into R other than
> >>> creating a
> >>> text file and reading it.  See the manual "R Data Import/Export" and
> >>> also Martin Maechler's presentation at useR!2007.
> >>> http://user2007.org/program/presentations/maechler.pdf
> >>>
> >>>> Also if that is the case, how can I label those many columns
> >>>> somehow associated with Y?
> >>>
> >>>> (2) A more serious concern is about memory. With the current
> >>>> looping
> >>>> approach it uses about 1GB. If I could possibly go with the matrix
> >>>> method described in (1), I'm worried that it might not be
> >>>> practically
> >>>> feasible with the current computers. Any thoughts?
> >>>
> >>> Well first you are discussing the computational methods used in lmer
> >>> but you want to fit a model with different residual variances for
> >>> different groups.  At present you can't do that in lmer.
> >>>
> >>> If you look at the lmer function in the development version of the
> >>> lme4 package (currently at
> >>> https://svn.r-project.org/R-packages/branches/gappy-lmer, soon to be
> >>> at http://r-forge.r-project.org/projects/lme4 for some value of
> >>> "soon") you will see that it follows the equations in my useR
> >>> presentation fairly closely.  The Xy array is n by (p + 1) with X in
> >>> the first p columns and y in the p + 1st column.  The object of
> >>> class
> >>> "lmer" has slots named y, Zt (Z-transpose), ZtXy (Zt %*% Xy), and
> >>> XytXy (crossprod(Xy)). After fitting the model to the first
> >>> simulated
> >>> response, producing the object 'fm',  the only operations needed to
> >>> update the model are
> >>>
> >>>  fm at y <- newy
> >>>  Xy <- cbind(fm at X, fm at y)
> >>>  fm at ZtXy <- fm at Zt %*% Xy
> >>>  fm at XytXy <- crossprod(Xy)
> >>>  lme4:::mer_finalize(fm, verbose)
> >>>
> >>> where 'verbose' is a logical scalar indicating if you want verbose
> >>> output during the optimization phase.  Once you get things
> >>> working on
> >>> a small example you would probably want to turn that off.
> >>>
> >>> Please note that this code applies to the development version of the
> >>> lme4 package.
> >>
>




More information about the R-sig-mixed-models mailing list