[R-sig-eco] help: Cluster analysis: obtaining average distances per cluster
Barnabas Daru
darunabas at gmail.com
Mon Jun 10 13:54:10 CEST 2013
Hi Marcelino,
Many thanks for your suggestion and the code.
I tried running the code especially the line that reads "diag(mydata2) <- 0", I got an error that reads:
Error in '[<-.data.frame'('*tmp*', cbind(i, i), value = 0) :
only logical matrix subscripts are allowed in replacement
Also, when I used the plot code you suggested, I still got a large dendrogram with so many tips rather than a summarized one with just the 9 tips I desired.
Attached is a screenshot of the type of plot I hope to get.
Thanks and kind regards
Barnabas
\-/
/\
/--|
/---/ Barnabas Daru
|--/ PhD Candidate,
\-/ African Centre for DNA Barcoding,
/\ University of Johannesburg,
/--\ PO Box 524, Auckland Park, 2006,
|---\ Johannesburg, South Africa.
\---\ Lab: +27 11 559 3477
\--| Mobile: +277 3818 9583
\-/ My homepage
/\
/--\
#…if you can think it, you can do it.
On 10 Jun 2013, at 12:04 PM, Marcelino de la Cruz wrote:
> Hi,
>
> try this:
>
>
> mydata2 <- mydata
> for (i in 1:9) mydata2[groups.9==i, groups.9==i] <- mean(as.dist(mydata[groups.9==i,groups.9==i]))
>
> diag(mydata2) <-0
>
> plot( hclust(as.dist(mydata2)))
>
>
>
> and please, send messages only to r-sig-ecology at r-project.org
>
> HTH,
>
> Marcelino
>
>
> El 09/06/2013 21:36, Barnabas Daru escribió:
>> Dear all,
>> I am using R to generate clusters for a pairwise distance matrix of a large dataset (over 3000 grid cells) and to group the cells based on similarity into 9 or fewer clusters.
>>
>> I have successfully used the "cutree" function in the R CLUSTER package to get 9 clusters; but I got stuck on how to create a new matrix based on means of all pairwise grid cell values for the new clusters defined by my cutree function as follows:
>>
>> mydata <- read.csv("my_distance_matrix.csv", header=T, row.names=1, sep=",")
>>
>> mydata_dist <- as.dist(mydata)
>> UPGMA <- hclust(mydata_dist)
>> plot(UPGMA) # Gives a large dendrogram with tip.labels difficult to read!
>> # I only want a summary dendrogram
>> groups.9 <- cutree(UPGMA, 9)
>>
>> groups.9.mean <- aggregate(UPGMA,list(groups.9),median)
>> # I got the following error:
>> Error in as.data.frame.default(x):
>> cannot coerce class '"hclust"' into a data.frame
>>
>> What I am interested in is to obtain the following:
>> (a). A dendrogram showing only the summary branches i.e. the dendrogram with only nine branches and the tip labels as the mean pairwise distance connecting each group for each clusters.
>> (b) to be able to use the "summary dendrogram" converted as a new distance matrix as described in (a) for further analysis e.g. NMDS etc.
>>
>> Any help especially in the form of R code will be highly appreciated.
>> Thanks and kind regards
>> Barnabas
>>
>> \-/
>> /\
>> /--|
>> /---/ Barnabas Daru
>> |--/ PhD Candidate,
>> \-/ African Centre for DNA Barcoding,
>> /\ University of Johannesburg,
>> /--\ PO Box 524, Auckland Park, 2006,
>> |---\ Johannesburg, South Africa.
>> \---\ Lab: +27 11 559 3477
>> \--| Mobile: +277 3818 9583
>> \-/ My homepage
>> /\
>> /--\
>>
>> #…if you can think it, you can do it.
>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>>
>>
>> _______________________________________________
>> R-sig-ecology mailing list
>> R-sig-ecology at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>>
>
More information about the R-sig-ecology
mailing list