[R] Big data (over 2GB) and lmer

Douglas Bates bates at stat.wisc.edu
Sat Oct 23 16:46:58 CEST 2010


On Thu, Oct 21, 2010 at 2:00 PM, Ben Bolker <bbolker at gmail.com> wrote:
> Michal Figurski <figurski <at> mail.med.upenn.edu> writes:
>
>> I have a data set of roughly 10 million records, 7 columns. It has only
>> about 500MB as a csv, so it fits in the memory. It's painfully slow to
>> do anything with it, but it's possible. I also have another dataset of
>> covariates that I would like to explore - with about 4GB of data...
>>
>> I would like to merge the two datasets and use lmer to build a mixed
>> effects model. Is there a way, for example using 'bigmemory' or 'ff', or
>> any other trick, to enable lmer to work on this data set?
>
>   I don't think this will be easy.
>
>   Do you really need mixed effects for this task?  i.e., are
> your numbers per group sufficiently small that you will benefit
> from the shrinkage etc. afforded by mixed models?  If you have
> (say) 10000 individuals per group, 1000 groups, then I would
> expect you'd get very accurate estimates of the group coefficients,
> you could then calculate variances etc. among these estimates.
>
>   You might get more informed answers on r-sig-mixed-models at r-project.org ...

lmer already stores the model matrices and factors related to the
random effects as sparse matrices.  Depending on the complexity of the
model - in particular, if random effects are defined with respect to
more than one grouping factor and, if so, if those factors are nested
or not - storing the Cholesky factor of the random effects model
matrix will be the limiting factor.  This object has many slots but
only two very large ones in the sense that they are long vectors.  At
present vectors accessed or created by R are limited to 2^31 elements
because they are indexed by 32-bit integers.

So the short answer is, "it depends".  Simple models may be possible.
Complex models will need to wait upon decisions about using wider
integer representations in indices.



More information about the R-help mailing list