[R] Mixture of Normals with Large Data
RAVI VARADHAN
rvaradhan at jhmi.edu
Sun Aug 5 21:12:13 CEST 2007
Another possibility is to use "data squashing" methods. Relevant papers are: (1) DuMouchel et al. (1999), (2) Madigan et al. (2002), and (3) Owen (1999).
Ravi.
____________________________________________________________________
Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology
School of Medicine
Johns Hopkins University
Ph. (410) 502-2619
email: rvaradhan at jhmi.edu
----- Original Message -----
From: "Charles C. Berry" <cberry at tajo.ucsd.edu>
Date: Saturday, August 4, 2007 8:01 pm
Subject: Re: [R] Mixture of Normals with Large Data
To: tvictor at dolphin.upenn.edu
Cc: r-help at stat.math.ethz.ch
> On Sat, 4 Aug 2007, Tim Victor wrote:
>
> > All:
> >
> > I am trying to fit a mixture of 2 normals with > 110 million
> observations. I
> > am running R 2.5.1 on a box with 1gb RAM running 32-bit windows and
> I
> > continue to run out of memory. Does anyone have any suggestions.
>
>
> If the first few million observations can be regarded as a SRS of the
>
> rest, then just use them. Or read in blocks of a convenient size and
>
> sample some observations from each block. You can repeat this process
> a
> few times to see if the results are sufficiently accurate.
>
> Otherwise, read in blocks of a convenient size (perhaps 1 million
> observations at a time), quantize the data to a manageable number of
>
> intervals - maybe a few thousand - and tabulate it. Add the counts
> over
> all the blocks.
>
> Then use mle() to fit a multinomial likelihood whose probabilities
> are the
> masses associated with each bin under a mixture of normals law.
>
> Chuck
>
> >
> > Thanks so much,
> >
> > Tim
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> >
> > PLEASE do read the posting guide
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> Charles C. Berry (858) 534-2098
> Dept of
> Family/Preventive Medicine
> E UC San Diego
> La Jolla, San Diego 92093-0901
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
>
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list