[R] Using lmer with huge amount of data

Tue Jul 24 21:42:51 CEST 2007

Thank you very much for the response, Prof. Ripley.

> I think I am missing something here: how do you make this 'huge'  
> and 'gigantic'?  You have not told us how many subjects you have,  
> but in imaging experiments it is usually no more than 50 and often  
> less.

Usually we have 10-30 subjects.

> For each subject you have 3 x 30,000 responses plus an age.  That  
> is under 1Mb of data per subject, so the problem looks modest  
> unless you have many hundreds of subjects.
>
> Nothing says you need to read the data in one go, but it will be  
> helpful to have all the data available to R at once (although this  
> could be alleviated by using a DBMS interface).

In the hypothetical situation I mentioned in my previous mail  
(suppose we have 12 subjects), all the input data would be stored in  
3 X 12 files each of which contains 30,000 numbers,  plus one more  
file for age. Sure I can read in those 27 files at once, and I'm not  
concerned about the data size at this point, but my question is: do I  
have to reshuffle those 36 files and create 30,000 separate arrays  
(one for each voxel) in R so that I could run lmer voxel-wise?

> I think the problem is rather going to be running 30,000 lmer fits,  
> which in my experience often take seconds each.  Each fit will only  
> need a modest amount of data (3 responses and one age per subject).

Right. What is the most efficient strategy to run such an analysis  
voxel-wise? Write a function, and then use apply()? Or simply do it  
in a loop?

Thanks,
Gang

>
> On Tue, 24 Jul 2007, Gang Chen wrote:
>
>> Based on the examples I've seen in using statistical analysis
>> packages such as lmer, it seems that people usually tabulate all the
>> input data into one file with the first line indicating the variable
>> names (or labels), and then read the file inside R. However, in my
>> case I can't do that because of the huge amount of imaging data.
>>
>> Suppose I have a one-way within-subject ANCOVA with one covariate,
>> and I would like to use lmer in R package lme4 to analyze the data.
>> In the terminology of linear mixed models, I have a fixed factor A
>> with 3 levels, a random factor B (subject), and a covariate (age)
>> with a model like this
>>
>> MyResult <- lmer(Response ~ FactorA + Age + (1 | subject),  
>> MyData, ...)
>>
>> My input data are like this: For each subject I have a file (a huge
>> matrix) storing the response values of the subject at many locations
>> (~30,000 voxels) corresponding to factor A at the 1st level, another
>> file for factor A at the 2nd level, and a 3rd file for factor A at
>> the 3rd level. Then I have another file storing the age of those
>> subjects. The analysis with the linear mixed model above would be
>> done at each voxel separately.
>>
>> It seems impractical to create one gigantic file or matrix to feed
>> into the above command line because of the big number of voxels. I'm
>> not sure how to proceed in this case. Any suggestions would be highly
>> appreciated.
>>
>> Also if I'm concerned about any potential violation of sphericity
>> among the 3 levels of factor A, how can I test sphericity violation
>> in lmer? And if violation exists, how can I make corrections in
>> contrast testing?
>>
>> Thank you very much,
>> Gang
>
> -- 
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595