[R] Mixture of Normals with Large Data
Tim Victor
statsdoc at gmail.com
Wed Aug 8 00:02:12 CEST 2007
I wasn't aware of this literature, thanks for the references.
On 8/5/07, RAVI VARADHAN <rvaradhan at jhmi.edu> wrote:
> Another possibility is to use "data squashing" methods. Relevant papers are: (1) DuMouchel et al. (1999), (2) Madigan et al. (2002), and (3) Owen (1999).
>
> Ravi.
> ____________________________________________________________________
>
> Ravi Varadhan, Ph.D.
> Assistant Professor,
> Division of Geriatric Medicine and Gerontology
> School of Medicine
> Johns Hopkins University
>
> Ph. (410) 502-2619
> email: rvaradhan at jhmi.edu
>
>
> ----- Original Message -----
> From: "Charles C. Berry" <cberry at tajo.ucsd.edu>
> Date: Saturday, August 4, 2007 8:01 pm
> Subject: Re: [R] Mixture of Normals with Large Data
> To: tvictor at dolphin.upenn.edu
> Cc: r-help at stat.math.ethz.ch
>
>
> > On Sat, 4 Aug 2007, Tim Victor wrote:
> >
> > > All:
> > >
> > > I am trying to fit a mixture of 2 normals with > 110 million
> > observations. I
> > > am running R 2.5.1 on a box with 1gb RAM running 32-bit windows and
> > I
> > > continue to run out of memory. Does anyone have any suggestions.
> >
> >
> > If the first few million observations can be regarded as a SRS of the
> >
> > rest, then just use them. Or read in blocks of a convenient size and
> >
> > sample some observations from each block. You can repeat this process
> > a
> > few times to see if the results are sufficiently accurate.
> >
> > Otherwise, read in blocks of a convenient size (perhaps 1 million
> > observations at a time), quantize the data to a manageable number of
> >
> > intervals - maybe a few thousand - and tabulate it. Add the counts
> > over
> > all the blocks.
> >
> > Then use mle() to fit a multinomial likelihood whose probabilities
> > are the
> > masses associated with each bin under a mixture of normals law.
> >
> > Chuck
> >
> > >
> > > Thanks so much,
> > >
> > > Tim
> > >
> > > [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help at stat.math.ethz.ch mailing list
> > >
> > > PLEASE do read the posting guide
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> > Charles C. Berry (858) 534-2098
> > Dept of
> > Family/Preventive Medicine
> > E UC San Diego
> > La Jolla, San Diego 92093-0901
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> >
> > PLEASE do read the posting guide
> > and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list