[R] Setting a minimum number of observations within an individual cluster
Norm.Good at csiro.au
Norm.Good at csiro.au
Wed Jun 13 02:27:47 CEST 2007
Hi
I'm trying to cluster a continuous dataset with a varying number of clusters and with a restriction that each cluster must have more than 'x' number of observations.
I have tried the clara function, using silhouette to give me the neighbouring cluster mediod of each observation, then merging an observation from a cluster with less than 'x' obs. into its' neighbour, but this comes unstuck if their neighbours also have less than 'x' obs.
So I'm fiddling with dendrogram objects. Is there any way of using the 'members' attribute to cut a dendrogram to only include branches with more than 'x' members?
An example output from clara with a data set of 1000 obs. and 82 clusters
> cl$clusinfo
size max_diss av_diss isolation
[1,] 1 0.00000000 0.00000000 0.0000000
[2,] 3 1.19840221 0.40837142 5.0938561
[3,] 4 0.16867940 0.07284916 0.5830662
[4,] 2 0.13380551 0.06690276 0.5687456
[5,] 3 0.21862177 0.13428115 1.0371933
[6,] 5 0.10384573 0.05270335 0.5887887
[7,] 2 0.08547020 0.04273510 0.4846024
[8,] 4 0.18615254 0.09545067 0.7396865
[9,] 7 0.15688781 0.08572887 0.6234016
.
.
.
[75,] 11 0.26963387 0.13985980 1.1447836
[76,] 6 0.21439705 0.11953365 0.5754212
[77,] 5 0.21131875 0.12920395 0.5567024
[78,] 3 0.17126227 0.09685930 0.7160261
[79,] 2 0.22622024 0.11311012 0.9457984
[80,] 2 0.10268536 0.05134268 0.5167766
[81,] 1 0.00000000 0.00000000 0.0000000
[82,] 2 0.10018837 0.05009419 0.2474480
Note that all observations from cluster 1 are not necessarily closest to cluster 2.
Cheers
Norm
Norm Good
Statistician
CMIS/e-Health Research Centre
A joint venture between CSIRO and the Queensland Government
Lvl 20, 300 Adelaide Street BRISBANE QLD 4000
PO Box 10842 Adelaide Street BRISBANE QLD 4000
Ph: 07 3024 1640 Fx: 07 3024 1690
Em: norm.good at csiro.au Web: http://e-hrc.net/
More information about the R-help
mailing list