[R] Examining how cases are similar by cluster, in cluster analysis
bgreen at dyson.brisnet.org.au
bgreen at dyson.brisnet.org.au
Mon Nov 19 00:52:37 CET 2012
Hello David,
Many thanks - this does exactly what I want and it lets me see whether the
clusters make sense in terms of the patetrn of values & where they join a
cluster.
Regards
Bob
> Something like this?
>
>> split(FS1, hcli8)
> $`1`
> X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12
> 1 1 1 0 1 0 0 1 1 0 1 1 1
> 3 1 0 1 0 0 1 1 0 0 1 0 1
> 4 1 1 0 0 0 0 1 1 1 1 1 1
> 7 0 1 0 1 0 0 1 1 0 1 0 1
> 9 1 1 1 1 0 1 1 0 1 1 1 0
> 12 1 0 0 0 0 1 1 1 1 1 0 1
> 13 0 1 1 1 1 0 0 0 1 1 0 1
> 15 1 0 1 1 0 0 1 0 0 1 0 1
> 16 1 0 1 0 0 1 1 0 1 0 1 1
> 19 0 1 0 0 0 0 1 0 0 1 0 1
> 20 0 1 1 1 0 0 0 1 1 0 0 1
> 24 1 1 0 1 0 0 1 0 1 1 1 0
> 26 1 1 1 1 1 1 0 1 0 1 0 1
> 28 1 0 1 0 1 0 1 1 0 1 1 1
> 33 1 1 0 1 0 0 0 0 1 1 0 0
> 38 1 1 1 0 0 0 0 0 1 1 0 0
> 40 1 0 1 0 0 0 1 0 0 1 1 1
> 41 1 1 0 0 0 0 0 0 1 1 1 1
> 43 0 0 1 0 0 0 1 0 1 1 0 1
> 52 1 1 1 1 0 0 0 1 1 1 0 1
> 53 1 1 0 0 1 0 0 1 1 1 0 1
> 56 1 0 1 0 0 1 1 0 1 0 0 0
> 60 1 1 1 0 1 1 0 1 1 1 0 1
>
> $`2`
> X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12
> 2 0 1 1 1 1 1 1 0 0 1 1 0
> 5 0 1 0 1 1 1 0 0 0 1 1 1
> 6 0 0 0 0 1 0 1 0 0 1 1 1
> 10 1 1 1 1 1 0 1 1 0 1 0 0
> 11 0 1 0 1 1 0 1 0 1 1 1 1
> 14 0 0 1 1 1 1 1 1 0 1 1 1
> 17 0 1 0 0 1 0 0 0 0 0 1 1
> 18 1 0 0 1 1 1 1 1 0 0 1 1
> 29 1 1 0 1 0 1 1 1 0 0 1 1
> 37 1 0 0 1 1 0 1 1 0 1 0 0
> 42 1 1 0 1 1 1 1 0 0 0 0 0
> 46 1 1 0 1 0 1 1 0 0 1 0 1
> 48 0 1 0 0 1 0 1 0 0 1 1 0
> 50 0 1 0 1 1 1 1 1 0 0 1 0
> 51 0 0 0 1 1 1 1 0 0 0 1 1
> 54 0 0 0 1 1 1 1 0 0 1 1 0
> 58 0 1 0 1 1 1 1 1 1 1 1 0
> 61 1 0 1 0 1 1 1 1 0 1 0 0
>
> $`3`
> X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12
> 8 0 1 1 0 0 1 0 1 1 1 1 0
> 21 0 1 0 0 1 1 0 1 0 1 1 0
> 22 1 1 0 0 0 1 1 1 0 0 1 0
> 25 0 1 0 0 0 1 0 1 0 1 1 0
> 27 1 1 0 0 1 1 0 1 1 0 0 0
> 32 1 1 1 0 1 1 0 1 0 0 1 0
> 36 1 1 0 0 0 1 0 1 0 0 0 0
> 44 1 1 1 1 1 1 0 1 0 0 0 0
> 63 0 1 1 0 1 1 0 0 1 1 1 0
>
> $`4`
> X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12
> 23 0 0 1 1 0 0 0 0 0 1 0 0
> 34 0 1 1 1 0 0 0 1 0 1 0 0
>
> $`5`
> X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12
> 30 0 0 0 0 1 1 0 0 1 1 0 1
> 31 0 1 1 0 1 0 0 0 1 0 1 1
> 35 0 0 1 0 1 1 0 0 1 1 0 1
> 47 0 0 1 0 1 0 0 0 1 0 0 1
> 49 1 0 0 0 1 1 0 0 1 1 1 0
> 55 1 0 1 0 1 0 0 0 0 1 1 0
> 59 0 0 1 0 1 0 0 0 1 0 1 1
>
> $`6`
> X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12
> 39 0 0 0 0 1 0 1 1 0 0 0 0
> 62 0 0 0 0 1 0 1 1 0 0 0 1
>
> $`7`
> X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12
> 45 1 1 0 0 0 0 0 0 0 0 1 0
>
> $`8`
> X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12
> 57 0 0 1 0 0 1 0 1 0 0 1 1
>
> -------
> David
>
>> -----Original Message-----
>> From: Bob Green [mailto:bgreen at dyson.brisnet.org.au]
>> Sent: Sunday, November 18, 2012 3:22 PM
>> To: dcarlson at tamu.edu; r-help at r-project.org
>> Subject: RE: [R] Examining how cases are similar by cluster, in cluster
>> analysis
>>
>> David,
>>
>>
>> Many thanks, I'm sure this will be helpful. What would also be
>> helpful is if I can extract each cluster and examine id by variable,
>> within the respective cluster. I could index the variables for each
>> cluster and run such an analysis but thre must be a more efficient
>> way of doing this (especially as I experiment with different
>> clustering methods)
>>
>> Thanks again,
>>
>> Bob
>>
>> At 06:44 AM 19/11/2012, David L Carlson wrote:
>> >If you just want a summary of the mean for each variable in each
>> >cluster, this will get you there:
>> >
>> > > set.seed=42
>> > > FS1 <- data.frame(matrix(sample(c(0, 1), 12*63, replace=TRUE),
>> >nrow=63,
>> >+ ncol=12))
>> > > dmat <- dist(FS1, method="binary")
>> > > cl.test <- hclust(dmat, method="average")
>> > > plot(cl.test, hang=-1)
>> > > hcli8 <- cutree(cl.test, k=8)
>> > > tbl <- aggregate(FS1, by=list(Group=hcli8), mean)
>> > > print(tbl, digits=4)
>> > Group X1 X2 X3 X4 X5 X6 X7 X8
>> >X9
>> >1 1 0.5122 0.6829 0.6829 0.6341 0.5854 0.5854 0.6829 0.6341
>> >0.5366
>> >2 2 0.0000 0.0000 0.0000 1.0000 0.6667 0.6667 0.0000 0.6667
>> >0.0000
>> >3 3 0.9286 0.1429 0.1429 0.1429 0.2857 0.5714 0.7857 0.3571
>> >0.8571
>> >4 4 1.0000 1.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000
>> >0.0000
>> >5 5 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
>> >1.0000
>> >6 6 1.0000 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000 1.0000
>> >0.0000
>> >7 7 1.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000
>> >0.0000
>> >8 8 0.0000 1.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000
>> >0.0000
>> > X10 X11 X12
>> >1 0.4146 0.4634 0.561
>> >2 0.6667 0.0000 0.000
>> >3 0.8571 0.6429 0.500
>> >4 1.0000 0.0000 0.000
>> >5 0.0000 1.0000 0.000
>> >6 0.0000 0.0000 1.000
>> >7 0.0000 0.0000 0.000
>> >8 0.0000 0.0000 0.000
>> > >
>> >----------------------------------------------
>> >David L Carlson
>> >Associate Professor of Anthropology
>> >Texas A&M University
>> >College Station, TX 77843-4352
>> >
>> > > -----Original Message-----
>> > > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
>> > > project.org] On Behalf Of Bob Green
>> > > Sent: Sunday, November 18, 2012 5:00 AM
>> > > To: r-help at r-project.org
>> > > Subject: [R] Examining how cases are similar by cluster, in
>> > > cluster analysis
>> > >
>> > > Hello,
>> > >
>> > > I used the following code to perform a cluster analysis on a
>> > > dataframe consisting of 12 variables (coded as 1,0) and 63
>> > > cases.
>> > >
>> > >
>> > >
>> > > FS1 <- read.csv("D://Arsontest2.csv",header=T,row.names=1)
>> > >
>> > > str(FS1)
>> > >
>> > > dmat <- dist(FS1, method="binary")
>> > >
>> > > cl.test <- hclust (dist(FS1, method ="binary"), "ave")
>> > >
>> > > plot(cl.test, hang = -1)
>> > >
>> > >
>> > >
>> > > Each case has an id and the dendogram identifies the respective
>> > > cases
>> > > which constitute each cluster. What I am seeking advice on is
>> > > how to
>> > > examine the variables on which the cases are similar, within
>> > > each cluster.
>> > >
>> > >
>> > >
>> > > sort (hcli8 <- cutree(cl.test, k=8)) identifies that the
>> > > following
>> > > cluster 2is comprised of the following cases:
>> > >
>> > > 1641 2295 2594 2654 2799 3213 3510 3513 2958 3294
>> > >
>> > > 2 2 2 2 2 2 2
>> > > 2
>> > > 2 2
>> > >
>> > >
>> > >
>> > > This code provides means for the variables by cluster. In
>> > > relation to
>> > > cluster 2 it appears the cases should have no clear motive and
>> > > be depressed :
>> > >
>> > > round(sapply(x, function(i) colMeans(FS1[i,])),2)
>> > >
>> > > [,1] [,2] [,3] [ ,4] [,5]
>> > > [,6] [,7] [,8]
>> > >
>> > > depressed 0.00 0.33 0.00 0.0 0 0.6 0.00 0.08
>> > >
>> > > unclear 0.33 1.00 1.00 1.0 0 0.0 0.07 0.12
>> > >
>> > >
>> > >
>> > > I can manually, examine this variable by variable and look at
>> > > how
>> > > each of the cases in cluster 2 are similar on the variables. I
>> > > am
>> > > looking at a more efficient and quicker way to do this.
>> > >
>> > > Bob
>> > >
>> > > ______________________________________________
>> > > R-help at r-project.org mailing list
>> > > https://stat.ethz.ch/mailman/listinfo/r-help
>> > > PLEASE do read the posting guide http://www.R-
>> > > project.org/posting-guide.html
>> > > and provide commented, minimal, self-contained, reproducible
>> > > code.
>
>
More information about the R-help
mailing list