[R-sig-eco] help: Cluster analysis: obtaining average distances per cluster

Barnabas Daru darunabas at gmail.com
Mon Jun 10 13:54:10 CEST 2013


Hi Marcelino,
Many thanks for your suggestion and the code.

I tried running the code especially the line that reads "diag(mydata2) <- 0", I got an error that reads:

Error in '[<-.data.frame'('*tmp*', cbind(i, i), value = 0) :
	only logical matrix subscripts are allowed in replacement

Also, when I used the plot code you suggested, I still got a large dendrogram with so many tips rather than a summarized one with just the 9 tips I desired.
Attached is a screenshot of the type of plot I hope to get.
Thanks and kind regards
Barnabas

 \-/
   /\
  /--|    
 /---/ Barnabas Daru
 |--/   PhD Candidate,
 \-/    African Centre for DNA Barcoding,
 /\      University of Johannesburg,
/--\   PO Box 524, Auckland Park, 2006,
|---\  Johannesburg, South Africa.
 \---\ Lab: +27 11 559 3477        
  \--|  Mobile: +277 3818 9583
   \-/  My homepage
   /\        
  /--\ 
 
#…if you can think it, you can do it.
 

On 10 Jun 2013, at 12:04 PM, Marcelino de la Cruz wrote:

> Hi,
> 
> try this:
> 
> 
> mydata2 <- mydata
> for (i in 1:9) mydata2[groups.9==i, groups.9==i] <- mean(as.dist(mydata[groups.9==i,groups.9==i]))
> 
> diag(mydata2) <-0
> 
> plot( hclust(as.dist(mydata2)))
> 
> 
> 
> and please, send messages only to r-sig-ecology at r-project.org
> 
> HTH,
> 
> Marcelino
> 
> 
> El 09/06/2013 21:36, Barnabas Daru escribió:
>> Dear all,
>> I am using R to generate clusters for a pairwise distance matrix of a large dataset (over 3000 grid cells) and to group the cells based on similarity into 9 or fewer clusters.
>> 
>> I have successfully used the "cutree" function in the R CLUSTER package to get 9 clusters; but I got stuck on how to create a new matrix based on means of all pairwise grid cell values for the new clusters defined by my cutree function as follows:
>> 
>> mydata <- read.csv("my_distance_matrix.csv", header=T, row.names=1, sep=",")
>> 
>> mydata_dist <- as.dist(mydata)
>> UPGMA <- hclust(mydata_dist)
>> plot(UPGMA) # Gives a large dendrogram with tip.labels difficult to read!
>> # I only want a summary dendrogram
>> groups.9 <- cutree(UPGMA, 9)
>> 
>> groups.9.mean <- aggregate(UPGMA,list(groups.9),median)
>> # I got the following error:
>> Error in as.data.frame.default(x):
>> 	cannot coerce class '"hclust"' into a data.frame
>> 
>> What I am interested in is to obtain the following:
>> (a). A dendrogram showing only the summary branches i.e. the dendrogram with only nine branches and the tip labels as the mean pairwise distance connecting each group for each clusters.
>> (b) to be able to use the "summary dendrogram" converted as a new distance matrix as described in (a) for further analysis e.g. NMDS etc.
>> 
>> Any help especially in the form of R code will be highly appreciated.
>> Thanks and kind regards
>> Barnabas
>> 
>>  \-/
>>    /\
>>   /--|
>>  /---/ Barnabas Daru
>>  |--/   PhD Candidate,
>>  \-/    African Centre for DNA Barcoding,
>>  /\      University of Johannesburg,
>> /--\   PO Box 524, Auckland Park, 2006,
>> |---\  Johannesburg, South Africa.
>>  \---\ Lab: +27 11 559 3477
>>   \--|  Mobile: +277 3818 9583
>>    \-/  My homepage
>>    /\
>>   /--\
>> 
>> #…if you can think it, you can do it.
>> 
>> 
>> 
>> 	[[alternative HTML version deleted]]
>> 
>> 
>> 
>> _______________________________________________
>> R-sig-ecology mailing list
>> R-sig-ecology at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>> 
> 



More information about the R-sig-ecology mailing list