[R] Subsample points for mclust
Mario Valle
mvalle at cscs.ch
Tue Jul 21 17:03:11 CEST 2009
Hi all!
I have an ordered vector of values. The distribution of these values can
be modeled by a sum of Gaussians.
So I'm using the package 'mclust' to get the Gaussians's parameters for
this 1D distribution. It works very well, but, for input sizes above
100.000 values it starts taking really forever. Unfortunately my dataset
has around 4.6M values...
My question: is it correct to subsample my dataset taking a value every
N to make mclust happy? Or have I no alternative except using the
complete dataset?
Excuse my profound ignorance and thank for your help!
mario
--
Ing. Mario Valle
Data Analysis and Visualization Group | http://www.cscs.ch/~mvalle
Swiss National Supercomputing Centre (CSCS) | Tel: +41 (91) 610.82.60
v. Cantonale Galleria 2, 6928 Manno, Switzerland | Fax: +41 (91) 610.82.82
More information about the R-help
mailing list