[R] Using lmer with huge amount of data
Prof Brian Ripley
ripley at stats.ox.ac.uk
Tue Jul 24 23:04:27 CEST 2007
On Tue, 24 Jul 2007, Gang Chen wrote:
> Thank you very much for the response, Prof. Ripley.
>
>
>> I think I am missing something here: how do you make this 'huge' and
>> 'gigantic'? You have not told us how many subjects you have, but in
>> imaging experiments it is usually no more than 50 and often less.
>
> Usually we have 10-30 subjects.
>
>
>> For each subject you have 3 x 30,000 responses plus an age. That is under
>> 1Mb of data per subject, so the problem looks modest unless you have many
>> hundreds of subjects.
>>
>> Nothing says you need to read the data in one go, but it will be helpful to
>> have all the data available to R at once (although this could be alleviated
>> by using a DBMS interface).
>
>
> In the hypothetical situation I mentioned in my previous mail (suppose we
> have 12 subjects), all the input data would be stored in 3 X 12 files each of
> which contains 30,000 numbers, plus one more file for age. Sure I can read
> in those 27 files at once, and I'm not concerned about the data size at this
> point, but my question is: do I have to reshuffle those 36 files and create
> 30,000 separate arrays (one for each voxel) in R so that I could run lmer
> voxel-wise?
No. You can index data structures in R.
>> I think the problem is rather going to be running 30,000 lmer fits, which
>> in my experience often take seconds each. Each fit will only need a modest
>> amount of data (3 responses and one age per subject).
>
> Right. What is the most efficient strategy to run such an analysis
> voxel-wise? Write a function, and then use apply()? Or simply do it in a
> loop?
apply() is a loop internally. I would just use a for() loop here,
probably running groups of voxels in different jobs run simultaneously on
multi-CPU machines.
>
> Thanks,
> Gang
>
>
>>
>> On Tue, 24 Jul 2007, Gang Chen wrote:
>>
>>> Based on the examples I've seen in using statistical analysis
>>> packages such as lmer, it seems that people usually tabulate all the
>>> input data into one file with the first line indicating the variable
>>> names (or labels), and then read the file inside R. However, in my
>>> case I can't do that because of the huge amount of imaging data.
>>>
>>> Suppose I have a one-way within-subject ANCOVA with one covariate,
>>> and I would like to use lmer in R package lme4 to analyze the data.
>>> In the terminology of linear mixed models, I have a fixed factor A
>>> with 3 levels, a random factor B (subject), and a covariate (age)
>>> with a model like this
>>>
>>> MyResult <- lmer(Response ~ FactorA + Age + (1 | subject), MyData, ...)
>>>
>>> My input data are like this: For each subject I have a file (a huge
>>> matrix) storing the response values of the subject at many locations
>>> (~30,000 voxels) corresponding to factor A at the 1st level, another
>>> file for factor A at the 2nd level, and a 3rd file for factor A at
>>> the 3rd level. Then I have another file storing the age of those
>>> subjects. The analysis with the linear mixed model above would be
>>> done at each voxel separately.
>>>
>>> It seems impractical to create one gigantic file or matrix to feed
>>> into the above command line because of the big number of voxels. I'm
>>> not sure how to proceed in this case. Any suggestions would be highly
>>> appreciated.
>>>
>>> Also if I'm concerned about any potential violation of sphericity
>>> among the 3 levels of factor A, how can I test sphericity violation
>>> in lmer? And if violation exists, how can I make corrections in
>>> contrast testing?
>>>
>>> Thank you very much,
>>> Gang
>>
>> --
>> Brian D. Ripley, ripley at stats.ox.ac.uk
>> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
>> University of Oxford, Tel: +44 1865 272861 (self)
>> 1 South Parks Road, +44 1865 272866 (PA)
>> Oxford OX1 3TG, UK Fax: +44 1865 272595
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list