[R] Using lmer with huge amount of data
Gang Chen
gangchen at mail.nih.gov
Tue Jul 24 21:42:51 CEST 2007
Thank you very much for the response, Prof. Ripley.
> I think I am missing something here: how do you make this 'huge'
> and 'gigantic'? You have not told us how many subjects you have,
> but in imaging experiments it is usually no more than 50 and often
> less.
Usually we have 10-30 subjects.
> For each subject you have 3 x 30,000 responses plus an age. That
> is under 1Mb of data per subject, so the problem looks modest
> unless you have many hundreds of subjects.
>
> Nothing says you need to read the data in one go, but it will be
> helpful to have all the data available to R at once (although this
> could be alleviated by using a DBMS interface).
In the hypothetical situation I mentioned in my previous mail
(suppose we have 12 subjects), all the input data would be stored in
3 X 12 files each of which contains 30,000 numbers, plus one more
file for age. Sure I can read in those 27 files at once, and I'm not
concerned about the data size at this point, but my question is: do I
have to reshuffle those 36 files and create 30,000 separate arrays
(one for each voxel) in R so that I could run lmer voxel-wise?
> I think the problem is rather going to be running 30,000 lmer fits,
> which in my experience often take seconds each. Each fit will only
> need a modest amount of data (3 responses and one age per subject).
Right. What is the most efficient strategy to run such an analysis
voxel-wise? Write a function, and then use apply()? Or simply do it
in a loop?
Thanks,
Gang
>
> On Tue, 24 Jul 2007, Gang Chen wrote:
>
>> Based on the examples I've seen in using statistical analysis
>> packages such as lmer, it seems that people usually tabulate all the
>> input data into one file with the first line indicating the variable
>> names (or labels), and then read the file inside R. However, in my
>> case I can't do that because of the huge amount of imaging data.
>>
>> Suppose I have a one-way within-subject ANCOVA with one covariate,
>> and I would like to use lmer in R package lme4 to analyze the data.
>> In the terminology of linear mixed models, I have a fixed factor A
>> with 3 levels, a random factor B (subject), and a covariate (age)
>> with a model like this
>>
>> MyResult <- lmer(Response ~ FactorA + Age + (1 | subject),
>> MyData, ...)
>>
>> My input data are like this: For each subject I have a file (a huge
>> matrix) storing the response values of the subject at many locations
>> (~30,000 voxels) corresponding to factor A at the 1st level, another
>> file for factor A at the 2nd level, and a 3rd file for factor A at
>> the 3rd level. Then I have another file storing the age of those
>> subjects. The analysis with the linear mixed model above would be
>> done at each voxel separately.
>>
>> It seems impractical to create one gigantic file or matrix to feed
>> into the above command line because of the big number of voxels. I'm
>> not sure how to proceed in this case. Any suggestions would be highly
>> appreciated.
>>
>> Also if I'm concerned about any potential violation of sphericity
>> among the 3 levels of factor A, how can I test sphericity violation
>> in lmer? And if violation exists, how can I make corrections in
>> contrast testing?
>>
>> Thank you very much,
>> Gang
>
> --
> Brian D. Ripley, ripley at stats.ox.ac.uk
> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel: +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list