[R] Subsample points for mclust

Mario Valle mvalle at cscs.ch
Tue Jul 21 17:03:11 CEST 2009


Hi all!

I have an ordered vector of values. The distribution of these values can 
be modeled by a sum of Gaussians.
So I'm using the package 'mclust' to get the Gaussians's parameters for 
this 1D distribution. It works very well, but, for input sizes above 
100.000 values it starts taking really forever. Unfortunately my dataset 
has around 4.6M values...

My question: is it correct to subsample my dataset taking a value every 
N to make mclust happy? Or have I no alternative except using the 
complete dataset?

Excuse my profound ignorance and thank for your help!
                                                                         
                     mario

-- 
Ing. Mario Valle
Data Analysis and Visualization Group            | http://www.cscs.ch/~mvalle
Swiss National Supercomputing Centre (CSCS)      | Tel:  +41 (91) 610.82.60
v. Cantonale Galleria 2, 6928 Manno, Switzerland | Fax:  +41 (91) 610.82.82




More information about the R-help mailing list