[R] Examining how cases are similar by cluster, in cluster analysis
David L Carlson
dcarlson at tamu.edu
Sun Nov 18 22:52:34 CET 2012
Something like this?
> split(FS1, hcli8)
$`1`
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12
1 1 1 0 1 0 0 1 1 0 1 1 1
3 1 0 1 0 0 1 1 0 0 1 0 1
4 1 1 0 0 0 0 1 1 1 1 1 1
7 0 1 0 1 0 0 1 1 0 1 0 1
9 1 1 1 1 0 1 1 0 1 1 1 0
12 1 0 0 0 0 1 1 1 1 1 0 1
13 0 1 1 1 1 0 0 0 1 1 0 1
15 1 0 1 1 0 0 1 0 0 1 0 1
16 1 0 1 0 0 1 1 0 1 0 1 1
19 0 1 0 0 0 0 1 0 0 1 0 1
20 0 1 1 1 0 0 0 1 1 0 0 1
24 1 1 0 1 0 0 1 0 1 1 1 0
26 1 1 1 1 1 1 0 1 0 1 0 1
28 1 0 1 0 1 0 1 1 0 1 1 1
33 1 1 0 1 0 0 0 0 1 1 0 0
38 1 1 1 0 0 0 0 0 1 1 0 0
40 1 0 1 0 0 0 1 0 0 1 1 1
41 1 1 0 0 0 0 0 0 1 1 1 1
43 0 0 1 0 0 0 1 0 1 1 0 1
52 1 1 1 1 0 0 0 1 1 1 0 1
53 1 1 0 0 1 0 0 1 1 1 0 1
56 1 0 1 0 0 1 1 0 1 0 0 0
60 1 1 1 0 1 1 0 1 1 1 0 1
$`2`
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12
2 0 1 1 1 1 1 1 0 0 1 1 0
5 0 1 0 1 1 1 0 0 0 1 1 1
6 0 0 0 0 1 0 1 0 0 1 1 1
10 1 1 1 1 1 0 1 1 0 1 0 0
11 0 1 0 1 1 0 1 0 1 1 1 1
14 0 0 1 1 1 1 1 1 0 1 1 1
17 0 1 0 0 1 0 0 0 0 0 1 1
18 1 0 0 1 1 1 1 1 0 0 1 1
29 1 1 0 1 0 1 1 1 0 0 1 1
37 1 0 0 1 1 0 1 1 0 1 0 0
42 1 1 0 1 1 1 1 0 0 0 0 0
46 1 1 0 1 0 1 1 0 0 1 0 1
48 0 1 0 0 1 0 1 0 0 1 1 0
50 0 1 0 1 1 1 1 1 0 0 1 0
51 0 0 0 1 1 1 1 0 0 0 1 1
54 0 0 0 1 1 1 1 0 0 1 1 0
58 0 1 0 1 1 1 1 1 1 1 1 0
61 1 0 1 0 1 1 1 1 0 1 0 0
$`3`
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12
8 0 1 1 0 0 1 0 1 1 1 1 0
21 0 1 0 0 1 1 0 1 0 1 1 0
22 1 1 0 0 0 1 1 1 0 0 1 0
25 0 1 0 0 0 1 0 1 0 1 1 0
27 1 1 0 0 1 1 0 1 1 0 0 0
32 1 1 1 0 1 1 0 1 0 0 1 0
36 1 1 0 0 0 1 0 1 0 0 0 0
44 1 1 1 1 1 1 0 1 0 0 0 0
63 0 1 1 0 1 1 0 0 1 1 1 0
$`4`
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12
23 0 0 1 1 0 0 0 0 0 1 0 0
34 0 1 1 1 0 0 0 1 0 1 0 0
$`5`
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12
30 0 0 0 0 1 1 0 0 1 1 0 1
31 0 1 1 0 1 0 0 0 1 0 1 1
35 0 0 1 0 1 1 0 0 1 1 0 1
47 0 0 1 0 1 0 0 0 1 0 0 1
49 1 0 0 0 1 1 0 0 1 1 1 0
55 1 0 1 0 1 0 0 0 0 1 1 0
59 0 0 1 0 1 0 0 0 1 0 1 1
$`6`
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12
39 0 0 0 0 1 0 1 1 0 0 0 0
62 0 0 0 0 1 0 1 1 0 0 0 1
$`7`
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12
45 1 1 0 0 0 0 0 0 0 0 1 0
$`8`
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12
57 0 0 1 0 0 1 0 1 0 0 1 1
-------
David
> -----Original Message-----
> From: Bob Green [mailto:bgreen at dyson.brisnet.org.au]
> Sent: Sunday, November 18, 2012 3:22 PM
> To: dcarlson at tamu.edu; r-help at r-project.org
> Subject: RE: [R] Examining how cases are similar by cluster, in cluster
> analysis
>
> David,
>
>
> Many thanks, I'm sure this will be helpful. What would also be
> helpful is if I can extract each cluster and examine id by variable,
> within the respective cluster. I could index the variables for each
> cluster and run such an analysis but thre must be a more efficient
> way of doing this (especially as I experiment with different
> clustering methods)
>
> Thanks again,
>
> Bob
>
> At 06:44 AM 19/11/2012, David L Carlson wrote:
> >If you just want a summary of the mean for each variable in each
> >cluster, this will get you there:
> >
> > > set.seed=42
> > > FS1 <- data.frame(matrix(sample(c(0, 1), 12*63, replace=TRUE),
> >nrow=63,
> >+ ncol=12))
> > > dmat <- dist(FS1, method="binary")
> > > cl.test <- hclust(dmat, method="average")
> > > plot(cl.test, hang=-1)
> > > hcli8 <- cutree(cl.test, k=8)
> > > tbl <- aggregate(FS1, by=list(Group=hcli8), mean)
> > > print(tbl, digits=4)
> > Group X1 X2 X3 X4 X5 X6 X7 X8
> >X9
> >1 1 0.5122 0.6829 0.6829 0.6341 0.5854 0.5854 0.6829 0.6341
> >0.5366
> >2 2 0.0000 0.0000 0.0000 1.0000 0.6667 0.6667 0.0000 0.6667
> >0.0000
> >3 3 0.9286 0.1429 0.1429 0.1429 0.2857 0.5714 0.7857 0.3571
> >0.8571
> >4 4 1.0000 1.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000
> >0.0000
> >5 5 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
> >1.0000
> >6 6 1.0000 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000 1.0000
> >0.0000
> >7 7 1.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000
> >0.0000
> >8 8 0.0000 1.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000
> >0.0000
> > X10 X11 X12
> >1 0.4146 0.4634 0.561
> >2 0.6667 0.0000 0.000
> >3 0.8571 0.6429 0.500
> >4 1.0000 0.0000 0.000
> >5 0.0000 1.0000 0.000
> >6 0.0000 0.0000 1.000
> >7 0.0000 0.0000 0.000
> >8 0.0000 0.0000 0.000
> > >
> >----------------------------------------------
> >David L Carlson
> >Associate Professor of Anthropology
> >Texas A&M University
> >College Station, TX 77843-4352
> >
> > > -----Original Message-----
> > > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> > > project.org] On Behalf Of Bob Green
> > > Sent: Sunday, November 18, 2012 5:00 AM
> > > To: r-help at r-project.org
> > > Subject: [R] Examining how cases are similar by cluster, in
> > > cluster analysis
> > >
> > > Hello,
> > >
> > > I used the following code to perform a cluster analysis on a
> > > dataframe consisting of 12 variables (coded as 1,0) and 63
> > > cases.
> > >
> > >
> > >
> > > FS1 <- read.csv("D://Arsontest2.csv",header=T,row.names=1)
> > >
> > > str(FS1)
> > >
> > > dmat <- dist(FS1, method="binary")
> > >
> > > cl.test <- hclust (dist(FS1, method ="binary"), "ave")
> > >
> > > plot(cl.test, hang = -1)
> > >
> > >
> > >
> > > Each case has an id and the dendogram identifies the respective
> > > cases
> > > which constitute each cluster. What I am seeking advice on is
> > > how to
> > > examine the variables on which the cases are similar, within
> > > each cluster.
> > >
> > >
> > >
> > > sort (hcli8 <- cutree(cl.test, k=8)) identifies that the
> > > following
> > > cluster 2is comprised of the following cases:
> > >
> > > 1641 2295 2594 2654 2799 3213 3510 3513 2958 3294
> > >
> > > 2 2 2 2 2 2 2
> > > 2
> > > 2 2
> > >
> > >
> > >
> > > This code provides means for the variables by cluster. In
> > > relation to
> > > cluster 2 it appears the cases should have no clear motive and
> > > be depressed :
> > >
> > > round(sapply(x, function(i) colMeans(FS1[i,])),2)
> > >
> > > [,1] [,2] [,3] [ ,4] [,5]
> > > [,6] [,7] [,8]
> > >
> > > depressed 0.00 0.33 0.00 0.0 0 0.6 0.00 0.08
> > >
> > > unclear 0.33 1.00 1.00 1.0 0 0.0 0.07 0.12
> > >
> > >
> > >
> > > I can manually, examine this variable by variable and look at
> > > how
> > > each of the cases in cluster 2 are similar on the variables. I
> > > am
> > > looking at a more efficient and quicker way to do this.
> > >
> > > Bob
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide http://www.R-
> > > project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible
> > > code.
More information about the R-help
mailing list