[R] Mixture of Normals with Large Data
rvaradhan at jhmi.edu
Sun Aug 5 21:12:13 CEST 2007
Another possibility is to use "data squashing" methods. Relevant papers are: (1) DuMouchel et al. (1999), (2) Madigan et al. (2002), and (3) Owen (1999).
Ravi Varadhan, Ph.D.
Division of Geriatric Medicine and Gerontology
School of Medicine
Johns Hopkins University
Ph. (410) 502-2619
email: rvaradhan at jhmi.edu
----- Original Message -----
From: "Charles C. Berry" <cberry at tajo.ucsd.edu>
Date: Saturday, August 4, 2007 8:01 pm
Subject: Re: [R] Mixture of Normals with Large Data
To: tvictor at dolphin.upenn.edu
Cc: r-help at stat.math.ethz.ch
> On Sat, 4 Aug 2007, Tim Victor wrote:
> > All:
> > I am trying to fit a mixture of 2 normals with > 110 million
> observations. I
> > am running R 2.5.1 on a box with 1gb RAM running 32-bit windows and
> > continue to run out of memory. Does anyone have any suggestions.
> If the first few million observations can be regarded as a SRS of the
> rest, then just use them. Or read in blocks of a convenient size and
> sample some observations from each block. You can repeat this process
> few times to see if the results are sufficiently accurate.
> Otherwise, read in blocks of a convenient size (perhaps 1 million
> observations at a time), quantize the data to a manageable number of
> intervals - maybe a few thousand - and tabulate it. Add the counts
> all the blocks.
> Then use mle() to fit a multinomial likelihood whose probabilities
> are the
> masses associated with each bin under a mixture of normals law.
> > Thanks so much,
> > Tim
> > [[alternative HTML version deleted]]
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > PLEASE do read the posting guide
> > and provide commented, minimal, self-contained, reproducible code.
> Charles C. Berry (858) 534-2098
> Dept of
> Family/Preventive Medicine
> E UC San Diego
> La Jolla, San Diego 92093-0901
> R-help at stat.math.ethz.ch mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help