# [R] Mixture of Normals with Large Data

Tim Victor statsdoc at gmail.com
Wed Aug 8 00:02:12 CEST 2007

```I wasn't aware of this literature, thanks for the references.

> Another possibility is to use "data squashing" methods.  Relevant papers are: (1) DuMouchel et al. (1999), (2) Madigan et al. (2002), and (3) Owen (1999).
>
> Ravi.
> ____________________________________________________________________
>
> Assistant Professor,
> Division of Geriatric Medicine and Gerontology
> School of Medicine
> Johns Hopkins University
>
> Ph. (410) 502-2619
>
>
> ----- Original Message -----
> From: "Charles C. Berry" <cberry at tajo.ucsd.edu>
> Date: Saturday, August 4, 2007 8:01 pm
> Subject: Re: [R] Mixture of Normals with Large Data
> To: tvictor at dolphin.upenn.edu
> Cc: r-help at stat.math.ethz.ch
>
>
> > On Sat, 4 Aug 2007, Tim Victor wrote:
> >
> >  > All:
> >  >
> >  > I am trying to fit a mixture of 2 normals with > 110 million
> > observations. I
> >  > am running R 2.5.1 on a box with 1gb RAM running 32-bit windows and
> > I
> >  > continue to run out of memory. Does anyone have any suggestions.
> >
> >
> >  If the first few million observations can be regarded as a SRS of the
> >
> >  rest, then just use them. Or read in blocks of a convenient size and
> >
> >  sample some observations from each block. You can repeat this process
> > a
> >  few times to see if the results are sufficiently accurate.
> >
> >  Otherwise, read in blocks of a convenient size (perhaps 1 million
> >  observations at a time), quantize the data to a manageable number of
> >
> >  intervals - maybe a few thousand - and tabulate it. Add the counts
> > over
> >  all the blocks.
> >
> >  Then use mle() to fit a multinomial likelihood whose probabilities
> > are the
> >  masses associated with each bin under a mixture of normals law.
> >
> >  Chuck
> >
> >  >
> >  > Thanks so much,
> >  >
> >  > Tim
> >  >
> >  >    [[alternative HTML version deleted]]
> >  >
> >  > ______________________________________________
> >  > R-help at stat.math.ethz.ch mailing list
> >  >
> >  > and provide commented, minimal, self-contained, reproducible code.
> >  >
> >
> >  Charles C. Berry                            (858) 534-2098
> >                                               Dept of
> > Family/Preventive Medicine
> >  E                     UC San Diego
> >    La Jolla, San Diego 92093-0901
> >
> >  ______________________________________________
> >  R-help at stat.math.ethz.ch mailing list
> >