[R-sig-ME] Dealing with large datasets in lme/lmer

Tue Sep 11 17:30:43 CEST 2007

Again thanks a lot for the help, Dr. Bates!

Now with the development version of lmer I got an error for the  
following model which works fine with the old lmer:

 >fit.lmer <- lmer(y ~ FA*FB*FC+weight+(1|Subj), Model);
Error in validObject(.Object) : invalid class "lmer" object: dims  
slot not named or incorrect length

What went wrong?

Regarding compiling, it works fine on a Mac OS X 10.4 with with duo  
2.7GHz processors, but failed on a Mac OS X 10.4.10 with a 2 GHz  
Intel Core Duo processor with the following error (does it have  
something to do with the Intel processor?):

 >  R CMD INSTALL ./lme4
* Installing to library '/Library/Frameworks/R.framework/Resources/ 
library'
* Installing *source* package 'lme4' ...
** libs
** arch - i386
gcc-4.0 -arch i386 -isysroot /Developer/SDKs/MacOSX10.4u.sdk -no-cpp- 
precomp -I/Library/Frameworks/R.framework/Resources/include -I/ 
Library/Frameworks/R.framework/Resources/include/i386  -msse3 -I"/ 
Library/Frameworks/R.framework/Resources/library/Matrix/include" - 
D__NO_MATH_INLINES  -fPIC  -g -O2 -std=gnu99 -march=nocona -c init.c - 
o init.o
gcc-4.0 -arch i386 -isysroot /Developer/SDKs/MacOSX10.4u.sdk -no-cpp- 
precomp -I/Library/Frameworks/R.framework/Resources/include -I/ 
Library/Frameworks/R.framework/Resources/include/i386  -msse3 -I"/ 
Library/Frameworks/R.framework/Resources/library/Matrix/include" - 
D__NO_MATH_INLINES  -fPIC  -g -O2 -std=gnu99 -march=nocona -c lmer.c - 
o lmer.o
lmer.c:356:48: error: macro "N_AS_CHM_DN" passed 3 arguments, but  
takes just 2
lmer.c: In function 'Mer_eta':
lmer.c:356: error: 'N_AS_CHM_DN' undeclared (first use in this function)
lmer.c:356: error: (Each undeclared identifier is reported only once
lmer.c:356: error: for each function it appears in.)
lmer.c:1448:40: error: macro "N_AS_CHM_DN" passed 3 arguments, but  
takes just 2
lmer.c: In function 'nglmer_condMode':
lmer.c:1448: error: 'N_AS_CHM_DN' undeclared (first use in this  
function)
lmer.c:1449:30: error: macro "N_AS_CHM_DN" passed 3 arguments, but  
takes just 2
lmer.c:1718:45: error: macro "N_AS_CHM_DN" passed 3 arguments, but  
takes just 2
lmer.c: In function 'lmer_MCMC_betab':
lmer.c:1718: error: 'N_AS_CHM_DN' undeclared (first use in this  
function)
make: *** [lmer.o] Error 1
chmod: /Library/Frameworks/R.framework/Versions/2.5/Resources/library/ 
lme4/libs/i386/*: No such file or directory
** arch - ppc
gcc-4.0 -arch ppc -isysroot /Developer/SDKs/MacOSX10.4u.sdk - 
std=gnu99 -no-cpp-precomp -I/Library/Frameworks/R.framework/Resources/ 
include -I/Library/Frameworks/R.framework/Resources/include/ppc  -I/ 
usr/local/include -I"/Library/Frameworks/R.framework/Resources/ 
library/Matrix/include"   -fPIC  -g -O2 -c init.c -o init.o
gcc-4.0 -arch ppc -isysroot /Developer/SDKs/MacOSX10.4u.sdk - 
std=gnu99 -no-cpp-precomp -I/Library/Frameworks/R.framework/Resources/ 
include -I/Library/Frameworks/R.framework/Resources/include/ppc  -I/ 
usr/local/include -I"/Library/Frameworks/R.framework/Resources/ 
library/Matrix/include"   -fPIC  -g -O2 -c lmer.c -o lmer.o
lmer.c:356:48: error: macro "N_AS_CHM_DN" passed 3 arguments, but  
takes just 2
lmer.c: In function 'Mer_eta':
lmer.c:356: error: 'N_AS_CHM_DN' undeclared (first use in this function)
lmer.c:356: error: (Each undeclared identifier is reported only once
lmer.c:356: error: for each function it appears in.)
lmer.c:1448:40: error: macro "N_AS_CHM_DN" passed 3 arguments, but  
takes just 2
lmer.c: In function 'nglmer_condMode':
lmer.c:1448: error: 'N_AS_CHM_DN' undeclared (first use in this  
function)
lmer.c:1449:30: error: macro "N_AS_CHM_DN" passed 3 arguments, but  
takes just 2
lmer.c:1718:45: error: macro "N_AS_CHM_DN" passed 3 arguments, but  
takes just 2
lmer.c: In function 'lmer_MCMC_betab':
lmer.c:1718: error: 'N_AS_CHM_DN' undeclared (first use in this  
function)
make: *** [lmer.o] Error 1
chmod: /Library/Frameworks/R.framework/Versions/2.5/Resources/library/ 
lme4/libs/ppc/*: No such file or directory
ERROR: compilation failed for package 'lme4'
** Removing '/Library/Frameworks/R.framework/Versions/2.5/Resources/ 
library/lme4'
** Restoring previous '/Library/Frameworks/R.framework/Versions/2.5/ 
Resources/library/lme4'

Thanks,
Gang

=========
Gang Chen, Ph. D.
National Institutes of Health, DHHS

On Sep 10, 2007, at 4:26 PM, Douglas Bates wrote:

> The best way to obtain the package sources is with a subversion
> client.  On Linux it is called svn.  The call to check out a copy of
> the package sources is
>
> svn co https://svn.r-project.org/R-packages/branches/gappy-lmer ./lme4
>
> You need the whole directory to build the package.  Under Linux or Mac
> OS X I use
>
> R CMD INSTALL ./lme4
>
> For Windows I use Uwe's win-builder at win-builder.R-project.org
>
> On 9/10/07, Gang Chen <gangchen at mail.nih.gov> wrote:
>> Dr. Bates,
>>
>> Thank you very much for the quick response and suggestions.
>>
>> I really want to try out the lmer development package, but could not
>> figure out how to download it. I tried to run
>>
>> wget https://svn.r-project.org/R-packages/branches/gappy-lmer/
>> (Or wget --no-check-certificate -r -nv -m -np -nH https://svn.r-
>> project.org/R-packages/branches/gappy-lmer/ )
>>
>> but it didn't work. So how can I obtain the package? Do I only need
>> the files under
>>
>> https://svn.r-project.org/R-packages/branches/gappy-lmer/R/
>>
>> or do I need to compile/build the package with the source code  
>> somehow?
>>
>> Thanks,
>> Gang
>>
>> =========
>> Gang Chen, Ph. D.
>> National Institutes of Health, HHS
>>
>>
>> On Sep 5, 2007, at 3:11 PM, Douglas Bates wrote:
>>
>>> On 9/4/07, Gang Chen <gangchen at mail.nih.gov> wrote:
>>>> Dear all,
>>>
>>>> I'm running mixed-effects analysis with large datasets in a loop  
>>>> like
>>>> this:
>>>
>>>> for (i in 1:60) {
>>>> for (j in 1:60) {
>>>> for (k in 1:60) {
>>>>     [...update y here in Model here...]
>>>>     fit.lme <- lme(y ~ FA*FB*FC+weight, random = pdBlocked(list
>>>> (pdCompSymm(~FB-1), pdCompSymm(~FC-1), pdIdent(~1))),
>>>> weights=varIdent
>>>> (form=~1|FA), Model);
>>>>     Stat[i, j, k,] <- anova(fit.lme)$F[-1];
>>>> }
>>>> }
>>>> }
>>>
>>> Did you create the array Stat outside the loop?  If not you will be
>>> doing a lot of copying of elements of that array.
>>>
>>>> This takes a little over 100 hours to finish on a Mac G5 with duo
>>>> 2.7GHz processors and 4GB memory.
>>>
>>>> In the mixed-effects model
>>>
>>>> y = X*beta + Z*b + e
>>>
>>>> the fixed-effects nxp matrix X and random-effects matrix nxq Z are
>>>> always the same for all the iterations in my case, and the only  
>>>> thing
>>>> that differs is y (and the estimates of beta, b and e also  
>>>> differ of
>>>> course). In my case n = 504 (large), p and q are moderate.  I just
>>>> read Dr. Douglas Bates's presentation during uerR! 2007 (very
>>>> informative by the way):
>>>
>>> Thank you.
>>>
>>>> http://user2007.org/program/presentations/bates.pdf
>>>
>>>> It seems many components in the extended system matrix (equation  
>>>> (2)
>>>> on page 22) for the Cholesky decomposition remain the same  
>>>> during the
>>>> iterations. So there are a lot of repetitive computations on those
>>>> same matrix operations in the above loop. How can I achieve a  
>>>> better
>>>> efficiency? Someone suggested to me running lme/lmer with a two-
>>>> dimensional response Y instead of one-dimensional y. My questions
>>>> are:
>>>
>>>> (1) So far I have only seen people running lme/lmer with y in a
>>>> format of one-dimensional array from a table. If I combine all  
>>>> those
>>>> y's (indices i, j, k) into an two-dimensional array Y, is there  
>>>> a way
>>>> I can run lme/lmer on Y instead of y? In other words, does lme/lmer
>>>> take a two-dimensional array Y?
>>>
>>> Not at present.
>>>
>>>> If so, do I have to save the huge
>>>> array in a table in text file and then read in R before I run lme/
>>>> lmer?
>>>
>>> No.  There are many ways of getting data into R other than  
>>> creating a
>>> text file and reading it.  See the manual "R Data Import/Export" and
>>> also Martin Maechler's presentation at useR!2007.
>>> http://user2007.org/program/presentations/maechler.pdf
>>>
>>>> Also if that is the case, how can I label those many columns
>>>> somehow associated with Y?
>>>
>>>> (2) A more serious concern is about memory. With the current  
>>>> looping
>>>> approach it uses about 1GB. If I could possibly go with the matrix
>>>> method described in (1), I'm worried that it might not be  
>>>> practically
>>>> feasible with the current computers. Any thoughts?
>>>
>>> Well first you are discussing the computational methods used in lmer
>>> but you want to fit a model with different residual variances for
>>> different groups.  At present you can't do that in lmer.
>>>
>>> If you look at the lmer function in the development version of the
>>> lme4 package (currently at
>>> https://svn.r-project.org/R-packages/branches/gappy-lmer, soon to be
>>> at http://r-forge.r-project.org/projects/lme4 for some value of
>>> "soon") you will see that it follows the equations in my useR
>>> presentation fairly closely.  The Xy array is n by (p + 1) with X in
>>> the first p columns and y in the p + 1st column.  The object of  
>>> class
>>> "lmer" has slots named y, Zt (Z-transpose), ZtXy (Zt %*% Xy), and
>>> XytXy (crossprod(Xy)). After fitting the model to the first  
>>> simulated
>>> response, producing the object 'fm',  the only operations needed to
>>> update the model are
>>>
>>>  fm at y <- newy
>>>  Xy <- cbind(fm at X, fm at y)
>>>  fm at ZtXy <- fm at Zt %*% Xy
>>>  fm at XytXy <- crossprod(Xy)
>>>  lme4:::mer_finalize(fm, verbose)
>>>
>>> where 'verbose' is a logical scalar indicating if you want verbose
>>> output during the optimization phase.  Once you get things  
>>> working on
>>> a small example you would probably want to turn that off.
>>>
>>> Please note that this code applies to the development version of the
>>> lme4 package.
>>