[R] Mixture of Normals with Large Data
Charles C. Berry
cberry at tajo.ucsd.edu
Sun Aug 5 02:01:14 CEST 2007
On Sat, 4 Aug 2007, Tim Victor wrote:
> I am trying to fit a mixture of 2 normals with > 110 million observations. I
> am running R 2.5.1 on a box with 1gb RAM running 32-bit windows and I
> continue to run out of memory. Does anyone have any suggestions.
If the first few million observations can be regarded as a SRS of the
rest, then just use them. Or read in blocks of a convenient size and
sample some observations from each block. You can repeat this process a
few times to see if the results are sufficiently accurate.
Otherwise, read in blocks of a convenient size (perhaps 1 million
observations at a time), quantize the data to a manageable number of
intervals - maybe a few thousand - and tabulate it. Add the counts over
all the blocks.
Then use mle() to fit a multinomial likelihood whose probabilities are the
masses associated with each bin under a mixture of normals law.
> Thanks so much,
> [[alternative HTML version deleted]]
> R-help at stat.math.ethz.ch mailing list
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Charles C. Berry (858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
More information about the R-help