[R] Mixture of Normals with Large Data

Charles C. Berry cberry at tajo.ucsd.edu
Sun Aug 5 02:01:14 CEST 2007


On Sat, 4 Aug 2007, Tim Victor wrote:

> All:
>
> I am trying to fit a mixture of 2 normals with > 110 million observations. I
> am running R 2.5.1 on a box with 1gb RAM running 32-bit windows and I
> continue to run out of memory. Does anyone have any suggestions.


If the first few million observations can be regarded as a SRS of the 
rest, then just use them. Or read in blocks of a convenient size and 
sample some observations from each block. You can repeat this process a 
few times to see if the results are sufficiently accurate.

Otherwise, read in blocks of a convenient size (perhaps 1 million 
observations at a time), quantize the data to a manageable number of 
intervals - maybe a few thousand - and tabulate it. Add the counts over 
all the blocks.

Then use mle() to fit a multinomial likelihood whose probabilities are the 
masses associated with each bin under a mixture of normals law.

Chuck

>
> Thanks so much,
>
> Tim
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901



More information about the R-help mailing list