[R] Using lmer with huge amount of data

Prof Brian Ripley ripley at stats.ox.ac.uk
Tue Jul 24 23:04:27 CEST 2007


On Tue, 24 Jul 2007, Gang Chen wrote:

> Thank you very much for the response, Prof. Ripley.
>
>
>> I think I am missing something here: how do you make this 'huge' and 
>> 'gigantic'?  You have not told us how many subjects you have, but in 
>> imaging experiments it is usually no more than 50 and often less.
>
> Usually we have 10-30 subjects.
>
>
>> For each subject you have 3 x 30,000 responses plus an age.  That is under 
>> 1Mb of data per subject, so the problem looks modest unless you have many 
>> hundreds of subjects.
>> 
>> Nothing says you need to read the data in one go, but it will be helpful to 
>> have all the data available to R at once (although this could be alleviated 
>> by using a DBMS interface).
>
>
> In the hypothetical situation I mentioned in my previous mail (suppose we 
> have 12 subjects), all the input data would be stored in 3 X 12 files each of 
> which contains 30,000 numbers,  plus one more file for age. Sure I can read 
> in those 27 files at once, and I'm not concerned about the data size at this 
> point, but my question is: do I have to reshuffle those 36 files and create 
> 30,000 separate arrays (one for each voxel) in R so that I could run lmer 
> voxel-wise?

No. You can index data structures in R.

>> I think the problem is rather going to be running 30,000 lmer fits, which 
>> in my experience often take seconds each.  Each fit will only need a modest 
>> amount of data (3 responses and one age per subject).
>
> Right. What is the most efficient strategy to run such an analysis 
> voxel-wise? Write a function, and then use apply()? Or simply do it in a 
> loop?

apply() is a loop internally.  I would just use a for() loop here, 
probably running groups of voxels in different jobs run simultaneously on 
multi-CPU machines.

>
> Thanks,
> Gang
>
>
>> 
>> On Tue, 24 Jul 2007, Gang Chen wrote:
>> 
>>> Based on the examples I've seen in using statistical analysis
>>> packages such as lmer, it seems that people usually tabulate all the
>>> input data into one file with the first line indicating the variable
>>> names (or labels), and then read the file inside R. However, in my
>>> case I can't do that because of the huge amount of imaging data.
>>> 
>>> Suppose I have a one-way within-subject ANCOVA with one covariate,
>>> and I would like to use lmer in R package lme4 to analyze the data.
>>> In the terminology of linear mixed models, I have a fixed factor A
>>> with 3 levels, a random factor B (subject), and a covariate (age)
>>> with a model like this
>>> 
>>> MyResult <- lmer(Response ~ FactorA + Age + (1 | subject), MyData, ...)
>>> 
>>> My input data are like this: For each subject I have a file (a huge
>>> matrix) storing the response values of the subject at many locations
>>> (~30,000 voxels) corresponding to factor A at the 1st level, another
>>> file for factor A at the 2nd level, and a 3rd file for factor A at
>>> the 3rd level. Then I have another file storing the age of those
>>> subjects. The analysis with the linear mixed model above would be
>>> done at each voxel separately.
>>> 
>>> It seems impractical to create one gigantic file or matrix to feed
>>> into the above command line because of the big number of voxels. I'm
>>> not sure how to proceed in this case. Any suggestions would be highly
>>> appreciated.
>>> 
>>> Also if I'm concerned about any potential violation of sphericity
>>> among the 3 levels of factor A, how can I test sphericity violation
>>> in lmer? And if violation exists, how can I make corrections in
>>> contrast testing?
>>> 
>>> Thank you very much,
>>> Gang
>> 
>> -- 
>> Brian D. Ripley,                  ripley at stats.ox.ac.uk
>> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
>> University of Oxford,             Tel:  +44 1865 272861 (self)
>> 1 South Parks Road,                     +44 1865 272866 (PA)
>> Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list