[R] cluster analysis

paulandpen paulandpen at optusnet.com.au
Fri Nov 2 13:10:53 CET 2007


AMINA SHAHZADI,

The eternal question.

What I do is that I generate a range of solutions, profile them on variables 
used to cluster the data into groups and any other information I have to 
profile the cluster groups on and then present the solutions to a group of 
others to assess meaningfulness, debate on the solutions and attempt to 
reach a consensus etc

In many cases, eg, for algorithms based on k-means and hierarchical 
clustering, you are using an exploratory technique and there are no 
right/wrong answers to this

Having used cluster analysis for years some things to look at because there 
is no way to answer this statistically (unless you are using a latent class 
type model with goodness of fit measures) are the following

1.  What is the minimum size you believe to be robust for a single cluster 
(eg n=30, n=100) etc because the larger the number of clusters you generate 
relative to sample size, the smaller your clusters will be and there must be 
a cut-off point defined upon which you are not prepared to go any lower...
2. If you run the clusters through different algorithms, how comparable are 
the results (cluster stability)
2.  What differences emerge between 2, 3, 4 cluster solutions etc (as you 
utilise larger numbers of clusters, does this still produce a meaningful 
result in that the clusters are distinct and unique, or are you just cutting 
larger clusters into smaller clusters without generating unique and usable 
information...  Examine the clusters via a series of cross tabs (as you go 
from 2 to 3 to 4 cluster solutions) what happens to the members within 
clusters, are they distributed differently etc

Thanks Paul

----- Original Message ----- 
From: "amna khan" <amnakhan493 at gmail.com>
To: <R-help at stat.math.ethz.ch>
Sent: Friday, November 02, 2007 2:19 AM
Subject: [R] cluster analysis


> Hi Sir
>
> How can we select the optimum number of clusters?
>
> Best Regards
>
> -- 
> AMINA SHAHZADI
> Department of Statistics
> GC University Lahore, Pakistan.
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list