# [R] how to calculate the consistency of different clusterings

Michael Bedward michael.bedward at gmail.com
Mon Jan 17 01:57:31 CET 2011

```Hello,

I've been waiting to see if anyone else would answer this.

I've previously used random reallocation of objects to groups
(clusters) as a monte-carlo test of the informativeness of groups, as
described here:

http://lastresortsoftware.blogspot.com/2010/09/monte-carlo-testing-of-classification.html

However, in your case it sounds like you want to investigate the
influence of particular attributes (traits) or groups of attributes on
the classification - is that correct ?  If so, I can probably help
with some R code but I'd need to know the clustering method you are
using (e.g. hclust).

Michael

On 14 January 2011 02:36, Mao Jianfeng <jianfeng.mao at gmail.com> wrote:
> Dear R-listers,
>
> I do clustering on tens of individuals by thousands of traits. I have
> known the assignment of each individual. I want to classify the
> individuals by randomly resampling different subsets of the traits,
> for example, randomly resampling 100 traits for 100 times, then 200
> traits for 100 times, then 300 traits for 100 times, ,,,,,,. By each
> subset of traits, I do clustering of the same individuals.
>
> In the end, I want to get the consistency (in percentage) of each of
> these clusterings (as examples, here "cluster.1", "cluster.2" and
> "cluster.3" in the dummy data) with the assignment which is already
> known ("populations" in the dummy data). I want to know how such work
> can be implemented, maybe by using R.
>
> #dummy data,
>
> clus.data <- data.frame(individual = paste("ind", 1:12, sep = ""),
> populations = c(rep("popA", 5), rep("popB", 7)), cluster.1 = c(rep(1,
> 5), rep(2, 7)), cluster.2 = c(rep(2, 4), rep(1, 8)), cluster.3 =
> c(rep(4, 7), rep(5, 5)))
>
> clus.data
>
> Thanks.
>
>
> --
> Jian-Feng, Mao
>
> the Institute of Botany,