[R] Examining how cases are similar by cluster, in cluster analysis

Sun Nov 18 21:44:47 CET 2012

If you just want a summary of the mean for each variable in each
cluster, this will get you there:

> set.seed=42
> FS1 <- data.frame(matrix(sample(c(0, 1), 12*63, replace=TRUE),
nrow=63, 
+ ncol=12))
> dmat <- dist(FS1, method="binary")
> cl.test <- hclust(dmat, method="average")
> plot(cl.test, hang=-1)
> hcli8 <- cutree(cl.test, k=8)
> tbl <- aggregate(FS1, by=list(Group=hcli8), mean)
> print(tbl, digits=4)
  Group     X1     X2     X3     X4     X5     X6     X7     X8
X9
1     1 0.5122 0.6829 0.6829 0.6341 0.5854 0.5854 0.6829 0.6341
0.5366
2     2 0.0000 0.0000 0.0000 1.0000 0.6667 0.6667 0.0000 0.6667
0.0000
3     3 0.9286 0.1429 0.1429 0.1429 0.2857 0.5714 0.7857 0.3571
0.8571
4     4 1.0000 1.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000
0.0000
5     5 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
1.0000
6     6 1.0000 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000 1.0000
0.0000
7     7 1.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000
0.0000
8     8 0.0000 1.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000
0.0000
     X10    X11   X12
1 0.4146 0.4634 0.561
2 0.6667 0.0000 0.000
3 0.8571 0.6429 0.500
4 1.0000 0.0000 0.000
5 0.0000 1.0000 0.000
6 0.0000 0.0000 1.000
7 0.0000 0.0000 0.000
8 0.0000 0.0000 0.000
>
----------------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Bob Green
> Sent: Sunday, November 18, 2012 5:00 AM
> To: r-help at r-project.org
> Subject: [R] Examining how cases are similar by cluster, in
> cluster analysis
> 
> Hello,
> 
> I used the following code to perform a cluster analysis on a
> dataframe consisting of 12 variables (coded as 1,0) and 63
> cases.
> 
> 
> 
> FS1 <- read.csv("D://Arsontest2.csv",header=T,row.names=1)
> 
> str(FS1)
> 
> dmat <- dist(FS1,  method="binary")
> 
> cl.test <- hclust (dist(FS1, method ="binary"), "ave")
> 
> plot(cl.test, hang = -1)
> 
> 
> 
> Each case has an id and the dendogram identifies the respective
> cases
> which constitute each cluster. What I am seeking advice on is
> how to
> examine the variables on which the cases are similar, within
> each cluster.
> 
> 
> 
> sort (hcli8 <- cutree(cl.test, k=8)) identifies that the
> following
> cluster 2is comprised of the following cases:
> 
> 1641 2295 2594 2654 2799 3213 3510  3513 2958 3294
> 
>     2         2        2       2        2        2        2
> 2
>        2        2
> 
> 
> 
> This code provides means for the variables by cluster. In
> relation to
> cluster 2 it appears the cases should have no clear motive and
> be depressed :
> 
> round(sapply(x, function(i) colMeans(FS1[i,])),2)
> 
>                                [,1]   [,2]   [,3] [ ,4]  [,5]
> [,6] [,7] [,8]
> 
> depressed        0.00 0.33 0.00  0.0    0  0.6 0.00 0.08
> 
> unclear             0.33 1.00 1.00  1.0    0  0.0 0.07 0.12
> 
> 
> 
> I can manually, examine this variable by variable and look at
> how
> each of the cases in cluster 2 are similar on the variables. I
> am
> looking at a more efficient and quicker way to do this.
> 
> Bob
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-
> project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible
> code.