[R] model-based quesiton

eewwaaww at interia.pl eewwaaww at interia.pl
Thu Jul 5 11:32:58 CEST 2007


It%u2019s going to be easy question to you. I%u2019ve started to interest in model-based clustering.
Adrian E. Raftery %u201CRecent Advances in Model-Based Clustering: Image Segmentation and Variable Selection%u201D (www.stat.washington.edu/Raftery)  showed that we can compare different classification methods using BIC statistic. For %u201Cdiabetes%u201D dataset the best model is VVV model with 3 classes- for this model the BIC curve reaches the highest value and the error rate=12%
BIC curve for EII model %u2248k-means is much under the VVV model curve and the error rate equals 18%, so k-means (EII)  is worse then  VVV, what%u2019s clear for me.

I would like to apply model-based to economic data set (GDP, life expectancy 
 data of UE countries), because I%u2019m PhD  student of University of Economics in Poland.
Using this data (standardized) I get the best model EEV (2 classes), EII (k-means) curve is under EEVcurve what suggests that k-means is worse then EEV, but class error for EII equals 0 and for EEV= 6% (and more for another variables), why?

Even applying %u201Ciris%u201D data we get lower class error for EII model (10%)  than for VEV (33%) for 2 classes,   in spite of another models curve are above EII model at the BIC plot.
For this data BIC doesn%u2019t choose the right number of clusters- it chooses VEV for 2 clusters while the right number of classes, given in column five equals 3.

When model-based clustering (for which data sets, are there any special type of data)  is better than k-means (kmeans), hierarchical clustering (hclust)?

I%u2019m looking forward to hearing from you.       

Best regards, 
              Ewa


----------------------------------------------------------------------
O Twoich stronach juz się mówi...
Na >>> http://link.interia.pl/f1ad3



More information about the R-help mailing list