[R-sig-ME] Dealing with large datasets in lme/lmer
Gang Chen
gangchen at mail.nih.gov
Tue Sep 11 17:30:43 CEST 2007
Again thanks a lot for the help, Dr. Bates!
Now with the development version of lmer I got an error for the
following model which works fine with the old lmer:
>fit.lmer <- lmer(y ~ FA*FB*FC+weight+(1|Subj), Model);
Error in validObject(.Object) : invalid class "lmer" object: dims
slot not named or incorrect length
What went wrong?
Regarding compiling, it works fine on a Mac OS X 10.4 with with duo
2.7GHz processors, but failed on a Mac OS X 10.4.10 with a 2 GHz
Intel Core Duo processor with the following error (does it have
something to do with the Intel processor?):
> R CMD INSTALL ./lme4
* Installing to library '/Library/Frameworks/R.framework/Resources/
library'
* Installing *source* package 'lme4' ...
** libs
** arch - i386
gcc-4.0 -arch i386 -isysroot /Developer/SDKs/MacOSX10.4u.sdk -no-cpp-
precomp -I/Library/Frameworks/R.framework/Resources/include -I/
Library/Frameworks/R.framework/Resources/include/i386 -msse3 -I"/
Library/Frameworks/R.framework/Resources/library/Matrix/include" -
D__NO_MATH_INLINES -fPIC -g -O2 -std=gnu99 -march=nocona -c init.c -
o init.o
gcc-4.0 -arch i386 -isysroot /Developer/SDKs/MacOSX10.4u.sdk -no-cpp-
precomp -I/Library/Frameworks/R.framework/Resources/include -I/
Library/Frameworks/R.framework/Resources/include/i386 -msse3 -I"/
Library/Frameworks/R.framework/Resources/library/Matrix/include" -
D__NO_MATH_INLINES -fPIC -g -O2 -std=gnu99 -march=nocona -c lmer.c -
o lmer.o
lmer.c:356:48: error: macro "N_AS_CHM_DN" passed 3 arguments, but
takes just 2
lmer.c: In function 'Mer_eta':
lmer.c:356: error: 'N_AS_CHM_DN' undeclared (first use in this function)
lmer.c:356: error: (Each undeclared identifier is reported only once
lmer.c:356: error: for each function it appears in.)
lmer.c:1448:40: error: macro "N_AS_CHM_DN" passed 3 arguments, but
takes just 2
lmer.c: In function 'nglmer_condMode':
lmer.c:1448: error: 'N_AS_CHM_DN' undeclared (first use in this
function)
lmer.c:1449:30: error: macro "N_AS_CHM_DN" passed 3 arguments, but
takes just 2
lmer.c:1718:45: error: macro "N_AS_CHM_DN" passed 3 arguments, but
takes just 2
lmer.c: In function 'lmer_MCMC_betab':
lmer.c:1718: error: 'N_AS_CHM_DN' undeclared (first use in this
function)
make: *** [lmer.o] Error 1
chmod: /Library/Frameworks/R.framework/Versions/2.5/Resources/library/
lme4/libs/i386/*: No such file or directory
** arch - ppc
gcc-4.0 -arch ppc -isysroot /Developer/SDKs/MacOSX10.4u.sdk -
std=gnu99 -no-cpp-precomp -I/Library/Frameworks/R.framework/Resources/
include -I/Library/Frameworks/R.framework/Resources/include/ppc -I/
usr/local/include -I"/Library/Frameworks/R.framework/Resources/
library/Matrix/include" -fPIC -g -O2 -c init.c -o init.o
gcc-4.0 -arch ppc -isysroot /Developer/SDKs/MacOSX10.4u.sdk -
std=gnu99 -no-cpp-precomp -I/Library/Frameworks/R.framework/Resources/
include -I/Library/Frameworks/R.framework/Resources/include/ppc -I/
usr/local/include -I"/Library/Frameworks/R.framework/Resources/
library/Matrix/include" -fPIC -g -O2 -c lmer.c -o lmer.o
lmer.c:356:48: error: macro "N_AS_CHM_DN" passed 3 arguments, but
takes just 2
lmer.c: In function 'Mer_eta':
lmer.c:356: error: 'N_AS_CHM_DN' undeclared (first use in this function)
lmer.c:356: error: (Each undeclared identifier is reported only once
lmer.c:356: error: for each function it appears in.)
lmer.c:1448:40: error: macro "N_AS_CHM_DN" passed 3 arguments, but
takes just 2
lmer.c: In function 'nglmer_condMode':
lmer.c:1448: error: 'N_AS_CHM_DN' undeclared (first use in this
function)
lmer.c:1449:30: error: macro "N_AS_CHM_DN" passed 3 arguments, but
takes just 2
lmer.c:1718:45: error: macro "N_AS_CHM_DN" passed 3 arguments, but
takes just 2
lmer.c: In function 'lmer_MCMC_betab':
lmer.c:1718: error: 'N_AS_CHM_DN' undeclared (first use in this
function)
make: *** [lmer.o] Error 1
chmod: /Library/Frameworks/R.framework/Versions/2.5/Resources/library/
lme4/libs/ppc/*: No such file or directory
ERROR: compilation failed for package 'lme4'
** Removing '/Library/Frameworks/R.framework/Versions/2.5/Resources/
library/lme4'
** Restoring previous '/Library/Frameworks/R.framework/Versions/2.5/
Resources/library/lme4'
Thanks,
Gang
=========
Gang Chen, Ph. D.
National Institutes of Health, DHHS
On Sep 10, 2007, at 4:26 PM, Douglas Bates wrote:
> The best way to obtain the package sources is with a subversion
> client. On Linux it is called svn. The call to check out a copy of
> the package sources is
>
> svn co https://svn.r-project.org/R-packages/branches/gappy-lmer ./lme4
>
> You need the whole directory to build the package. Under Linux or Mac
> OS X I use
>
> R CMD INSTALL ./lme4
>
> For Windows I use Uwe's win-builder at win-builder.R-project.org
>
> On 9/10/07, Gang Chen <gangchen at mail.nih.gov> wrote:
>> Dr. Bates,
>>
>> Thank you very much for the quick response and suggestions.
>>
>> I really want to try out the lmer development package, but could not
>> figure out how to download it. I tried to run
>>
>> wget https://svn.r-project.org/R-packages/branches/gappy-lmer/
>> (Or wget --no-check-certificate -r -nv -m -np -nH https://svn.r-
>> project.org/R-packages/branches/gappy-lmer/ )
>>
>> but it didn't work. So how can I obtain the package? Do I only need
>> the files under
>>
>> https://svn.r-project.org/R-packages/branches/gappy-lmer/R/
>>
>> or do I need to compile/build the package with the source code
>> somehow?
>>
>> Thanks,
>> Gang
>>
>> =========
>> Gang Chen, Ph. D.
>> National Institutes of Health, HHS
>>
>>
>> On Sep 5, 2007, at 3:11 PM, Douglas Bates wrote:
>>
>>> On 9/4/07, Gang Chen <gangchen at mail.nih.gov> wrote:
>>>> Dear all,
>>>
>>>> I'm running mixed-effects analysis with large datasets in a loop
>>>> like
>>>> this:
>>>
>>>> for (i in 1:60) {
>>>> for (j in 1:60) {
>>>> for (k in 1:60) {
>>>> [...update y here in Model here...]
>>>> fit.lme <- lme(y ~ FA*FB*FC+weight, random = pdBlocked(list
>>>> (pdCompSymm(~FB-1), pdCompSymm(~FC-1), pdIdent(~1))),
>>>> weights=varIdent
>>>> (form=~1|FA), Model);
>>>> Stat[i, j, k,] <- anova(fit.lme)$F[-1];
>>>> }
>>>> }
>>>> }
>>>
>>> Did you create the array Stat outside the loop? If not you will be
>>> doing a lot of copying of elements of that array.
>>>
>>>> This takes a little over 100 hours to finish on a Mac G5 with duo
>>>> 2.7GHz processors and 4GB memory.
>>>
>>>> In the mixed-effects model
>>>
>>>> y = X*beta + Z*b + e
>>>
>>>> the fixed-effects nxp matrix X and random-effects matrix nxq Z are
>>>> always the same for all the iterations in my case, and the only
>>>> thing
>>>> that differs is y (and the estimates of beta, b and e also
>>>> differ of
>>>> course). In my case n = 504 (large), p and q are moderate. I just
>>>> read Dr. Douglas Bates's presentation during uerR! 2007 (very
>>>> informative by the way):
>>>
>>> Thank you.
>>>
>>>> http://user2007.org/program/presentations/bates.pdf
>>>
>>>> It seems many components in the extended system matrix (equation
>>>> (2)
>>>> on page 22) for the Cholesky decomposition remain the same
>>>> during the
>>>> iterations. So there are a lot of repetitive computations on those
>>>> same matrix operations in the above loop. How can I achieve a
>>>> better
>>>> efficiency? Someone suggested to me running lme/lmer with a two-
>>>> dimensional response Y instead of one-dimensional y. My questions
>>>> are:
>>>
>>>> (1) So far I have only seen people running lme/lmer with y in a
>>>> format of one-dimensional array from a table. If I combine all
>>>> those
>>>> y's (indices i, j, k) into an two-dimensional array Y, is there
>>>> a way
>>>> I can run lme/lmer on Y instead of y? In other words, does lme/lmer
>>>> take a two-dimensional array Y?
>>>
>>> Not at present.
>>>
>>>> If so, do I have to save the huge
>>>> array in a table in text file and then read in R before I run lme/
>>>> lmer?
>>>
>>> No. There are many ways of getting data into R other than
>>> creating a
>>> text file and reading it. See the manual "R Data Import/Export" and
>>> also Martin Maechler's presentation at useR!2007.
>>> http://user2007.org/program/presentations/maechler.pdf
>>>
>>>> Also if that is the case, how can I label those many columns
>>>> somehow associated with Y?
>>>
>>>> (2) A more serious concern is about memory. With the current
>>>> looping
>>>> approach it uses about 1GB. If I could possibly go with the matrix
>>>> method described in (1), I'm worried that it might not be
>>>> practically
>>>> feasible with the current computers. Any thoughts?
>>>
>>> Well first you are discussing the computational methods used in lmer
>>> but you want to fit a model with different residual variances for
>>> different groups. At present you can't do that in lmer.
>>>
>>> If you look at the lmer function in the development version of the
>>> lme4 package (currently at
>>> https://svn.r-project.org/R-packages/branches/gappy-lmer, soon to be
>>> at http://r-forge.r-project.org/projects/lme4 for some value of
>>> "soon") you will see that it follows the equations in my useR
>>> presentation fairly closely. The Xy array is n by (p + 1) with X in
>>> the first p columns and y in the p + 1st column. The object of
>>> class
>>> "lmer" has slots named y, Zt (Z-transpose), ZtXy (Zt %*% Xy), and
>>> XytXy (crossprod(Xy)). After fitting the model to the first
>>> simulated
>>> response, producing the object 'fm', the only operations needed to
>>> update the model are
>>>
>>> fm at y <- newy
>>> Xy <- cbind(fm at X, fm at y)
>>> fm at ZtXy <- fm at Zt %*% Xy
>>> fm at XytXy <- crossprod(Xy)
>>> lme4:::mer_finalize(fm, verbose)
>>>
>>> where 'verbose' is a logical scalar indicating if you want verbose
>>> output during the optimization phase. Once you get things
>>> working on
>>> a small example you would probably want to turn that off.
>>>
>>> Please note that this code applies to the development version of the
>>> lme4 package.
>>
More information about the R-sig-mixed-models
mailing list