[R] CLUSTER Package
Martin Maechler
maechler at stat.math.ethz.ch
Fri Mar 30 15:39:46 CEST 2007
It seems nobody else was willing to help here
(when the original poster did not at all follow the posting
guide).
In the mean time, someone else has asked me about part of this,
so let me answer in public :
>>>>> "MM" == Martin Maechler <maechler at stat.math.ethz.ch>
>>>>> on Mon, 12 Mar 2007 17:23:30 +0100 writes:
MM> Hi Vallejo, I'm pretty busy currently, and feel your
MM> question has much more to do with how to use R more
MM> generally than with using the functions from the cluster
MM> package.
MM> So you may get help from other R-help readers, but maybe
MM> only after you have followed the posting-guide and give
MM> a reproducible example as you're asked there.
MM> Regards, Martin Maechler
>>>>> "VallejoR" == Vallejo, Roger <Roger.Vallejo at ARS.USDA.GOV>
>>>>> on Mon, 12 Mar 2007 10:28:01 -0400 writes:
VallejoR> Hi Martin, In using the Cluster Package, I have
VallejoR> results for PAM and DIANA clustering algorithms
VallejoR> (below "part" and "hier" objects):
VallejoR> part <- pam(trout, bestk) # PAM results
VallejoR> hier <- diana(trout) # DIANA results
VallejoR> GeneNames <- show(RG$genes) # Gene Names are in this object
(RG is what)?
VallejoR> But I would like also to know what genes (NAMES)
VallejoR> are included in each cluster. I tried
VallejoR> unsuccessfully to send these results to output
VallejoR> files (clusters with gene Names). This must be an
VallejoR> easy task for a good R programmer. I will
VallejoR> appreciate very much directions or R code on how
VallejoR> to send the PAM and DIANA results to output files
VallejoR> including information on genes (Names) per each
VallejoR> cluster.
For diana(), a *hierarchical* clustering {as agnes()}, you need
to decide about the number of clusters yourself.
Then, as the example in help(diana.object) shows,
you can use cutree() to get the grouping vector:
Here's a reproducible example :
library(cluster)
data(votes.repub)
dv <- diana(votes.repub, metric = "manhattan", stand = TRUE)
print(dv)
plot(dv)
## Cut into 2 groups:
dv2 <- cutree(as.hclust(dv), k = 2)
table(dv2) # 8 and 42 group members
rownames(votes.repub)[dv2 == 1]
## For two groups, does the metric matter ?
dv0 <- diana(votes.repub, stand = TRUE) # default: Euclidean
dv.2 <- cutree(as.hclust(dv0), k = 2)
table(dv2 == dv.2)## identical group assignments
----------------
For pam(), it's even simpler :
data(ruspini)
pr <- pam(ruspini, 4)
plot(pr)
# ....Hit <Return> to see next plot:
str(pr)
## or
summary(pr)
## .. shows you that there's a component 'clustering' :
pr$clustering
## a grouping vector with case-labels {your Gene names}; here "1","2",.."150:
## and to get them ``visually'':
split(rownames(ruspini), pr$clustering)
## $`1`
## [1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14" "15"
## [16] "16" "17" "18" "19" "20"
## $`2`
## [1] "21" "22" "23" "24" "25" "26" "27" "28" "29" "30" "31" "32" "33" "34" "35"
## [16] "36" "37" "38" "39" "40" "41" "42" "43"
## $`3`
## [1] "44" "45" "46" "47" "48" "49" "50" "51" "52" "53" "54" "55" "56" "57" "58"
## [16] "59" "60"
## $`4`
## [1] "61" "62" "63" "64" "65" "66" "67" "68" "69" "70" "71" "72" "73" "74" "75"
More information about the R-help
mailing list