[R] Setting a minimum number of observations within an individual cluster

Norm.Good at csiro.au Norm.Good at csiro.au
Wed Jun 13 02:27:47 CEST 2007


Hi

I'm trying to cluster a continuous dataset with a varying number of clusters and with a restriction that each cluster must have more than 'x' number of observations. 

I have tried the clara function, using silhouette to give me the neighbouring cluster mediod of each observation, then merging an observation from a cluster with less than 'x' obs. into its' neighbour, but this comes unstuck if their neighbours also have less than 'x' obs.

So I'm fiddling with dendrogram objects.  Is there any way of using the 'members' attribute to cut a dendrogram to only include branches with more than 'x' members?

An example output from clara with a data set of 1000 obs. and 82 clusters

> cl$clusinfo
      size   max_diss    av_diss isolation
 [1,]    1 0.00000000 0.00000000 0.0000000
 [2,]    3 1.19840221 0.40837142 5.0938561
 [3,]    4 0.16867940 0.07284916 0.5830662
 [4,]    2 0.13380551 0.06690276 0.5687456
 [5,]    3 0.21862177 0.13428115 1.0371933
 [6,]    5 0.10384573 0.05270335 0.5887887
 [7,]    2 0.08547020 0.04273510 0.4846024
 [8,]    4 0.18615254 0.09545067 0.7396865
 [9,]    7 0.15688781 0.08572887 0.6234016
.
.
.
[75,]   11 0.26963387 0.13985980 1.1447836
[76,]    6 0.21439705 0.11953365 0.5754212
[77,]    5 0.21131875 0.12920395 0.5567024
[78,]    3 0.17126227 0.09685930 0.7160261
[79,]    2 0.22622024 0.11311012 0.9457984
[80,]    2 0.10268536 0.05134268 0.5167766
[81,]    1 0.00000000 0.00000000 0.0000000
[82,]    2 0.10018837 0.05009419 0.2474480

Note that all observations from cluster 1 are not necessarily closest to cluster 2.

Cheers

Norm   

Norm Good
Statistician
CMIS/e-Health Research Centre
A joint venture between CSIRO and the Queensland Government
Lvl 20, 300 Adelaide Street BRISBANE QLD 4000
PO Box 10842 Adelaide Street BRISBANE QLD 4000
Ph: 07 3024 1640 Fx: 07 3024 1690 
Em: norm.good at csiro.au  Web: http://e-hrc.net/



More information about the R-help mailing list