[R] Mixture of Normals with Large Data

RAVI VARADHAN rvaradhan at jhmi.edu
Sun Aug 5 21:12:13 CEST 2007


Another possibility is to use "data squashing" methods.  Relevant papers are: (1) DuMouchel et al. (1999), (2) Madigan et al. (2002), and (3) Owen (1999).  

Ravi.
____________________________________________________________________

Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology
School of Medicine
Johns Hopkins University

Ph. (410) 502-2619
email: rvaradhan at jhmi.edu


----- Original Message -----
From: "Charles C. Berry" <cberry at tajo.ucsd.edu>
Date: Saturday, August 4, 2007 8:01 pm
Subject: Re: [R] Mixture of Normals with Large Data
To: tvictor at dolphin.upenn.edu
Cc: r-help at stat.math.ethz.ch


> On Sat, 4 Aug 2007, Tim Victor wrote:
>  
>  > All:
>  >
>  > I am trying to fit a mixture of 2 normals with > 110 million 
> observations. I
>  > am running R 2.5.1 on a box with 1gb RAM running 32-bit windows and 
> I
>  > continue to run out of memory. Does anyone have any suggestions.
>  
>  
>  If the first few million observations can be regarded as a SRS of the 
> 
>  rest, then just use them. Or read in blocks of a convenient size and 
> 
>  sample some observations from each block. You can repeat this process 
> a 
>  few times to see if the results are sufficiently accurate.
>  
>  Otherwise, read in blocks of a convenient size (perhaps 1 million 
>  observations at a time), quantize the data to a manageable number of 
> 
>  intervals - maybe a few thousand - and tabulate it. Add the counts 
> over 
>  all the blocks.
>  
>  Then use mle() to fit a multinomial likelihood whose probabilities 
> are the 
>  masses associated with each bin under a mixture of normals law.
>  
>  Chuck
>  
>  >
>  > Thanks so much,
>  >
>  > Tim
>  >
>  > 	[[alternative HTML version deleted]]
>  >
>  > ______________________________________________
>  > R-help at stat.math.ethz.ch mailing list
>  > 
>  > PLEASE do read the posting guide 
>  > and provide commented, minimal, self-contained, reproducible code.
>  >
>  
>  Charles C. Berry                            (858) 534-2098
>                                               Dept of 
> Family/Preventive Medicine
>  E                     UC San Diego
>    La Jolla, San Diego 92093-0901
>  
>  ______________________________________________
>  R-help at stat.math.ethz.ch mailing list
>  
>  PLEASE do read the posting guide 
>  and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list