[R-sig-eco] help: Cluster analysis: obtaining average distances per cluster

Marcelino de la Cruz marcelino.delacruz at upm.es
Mon Jun 10 17:26:22 CEST 2013


Hi Barnabas,

maybe now?


mydata2 <- as.matrix(mydata)

average.dist <- NULL
for (i in 1:9) {
	mydata2[groups.9==i, groups.9==i] <- 
mean(as.dist(mydata[groups.9==i,groups.9==i]))
         average.dist <- 
c(average.dist,mean(as.dist(mydata[groups.9==i,groups.9==i])))
	}
diag(mydata2) <-0

plot( hclust(as.dist(mydata2)))

mydata3 <- mydata2[match(1:9,groups.9),match(1:9,groups.9)]

plot( hclust(as.dist(mydata3)), labels=round(average.dist,2))




El 10/06/2013 13:54, Barnabas Daru escribió:
> Hi Marcelino,
> Many thanks for your suggestion and the code.
>
> I tried running the code especially the line that reads "diag(mydata2)
> <- 0", I got an error that reads:
>
> *Error in '[<-.data.frame'('*tmp*', cbind(i, i), value = 0) :*
> *only logical matrix subscripts are allowed in replacement*
>
> Also, when I used the plot code you suggested, I still got a large
> dendrogram with so many tips rather than a summarized one with just the
> 9 tips I desired.
> Attached is a screenshot of the type of plot I hope to get.
> Thanks and kind regards
> Barnabas
>
>   \-/
>     /\
>    /--|
>   /---/ Barnabas Daru
>   |--/   PhD Candidate,
>   \-/    African Centre for DNA Barcoding,
>   /\      University of Johannesburg,
> /--\   PO Box 524, Auckland Park, 2006,
> |---\  Johannesburg, South Africa.
>   \---\ Lab: +27 11 559 3477
>    \--|  Mobile: +277 3818 9583
>     \-/ My homepage <http://barnabasdaru.wordpress.com>
>     /\
>    /--\
>
> #…if you can think it, you can do it.
>
> On 10 Jun 2013, at 12:04 PM, Marcelino de la Cruz wrote:
>
>> Hi,
>>
>> try this:
>>
>>
>> mydata2 <- mydata
>> for (i in 1:9) mydata2[groups.9==i, groups.9==i] <-
>> mean(as.dist(mydata[groups.9==i,groups.9==i]))
>>
>> diag(mydata2) <-0
>>
>> plot( hclust(as.dist(mydata2)))
>>
>>
>>
>> and please, send messages only to r-sig-ecology at r-project.org
>> <mailto:r-sig-ecology at r-project.org>
>>
>> HTH,
>>
>> Marcelino
>>
>>
>> El 09/06/2013 21:36, Barnabas Daru escribió:
>>> Dear all,
>>> I am using R to generate clusters for a pairwise distance matrix of a
>>> large dataset (over 3000 grid cells) and to group the cells based on
>>> similarity into 9 or fewer clusters.
>>>
>>> I have successfully used the "cutree" function in the R CLUSTER
>>> package to get 9 clusters; but I got stuck on how to create a new
>>> matrix based on means of all pairwise grid cell values for the new
>>> clusters defined by my cutree function as follows:
>>>
>>> mydata <- read.csv("my_distance_matrix.csv", header=T, row.names=1,
>>> sep=",")
>>>
>>> mydata_dist <- as.dist(mydata)
>>> UPGMA <- hclust(mydata_dist)
>>> plot(UPGMA) # Gives a large dendrogram with tip.labels difficult to read!
>>> # I only want a summary dendrogram
>>> groups.9 <- cutree(UPGMA, 9)
>>>
>>> groups.9.mean <- aggregate(UPGMA,list(groups.9),median)
>>> # I got the following error:
>>> Error in as.data.frame.default(x):
>>> cannot coerce class '"hclust"' into a data.frame
>>>
>>> What I am interested in is to obtain the following:
>>> (a). A dendrogram showing only the summary branches i.e. the
>>> dendrogram with only nine branches and the tip labels as the mean
>>> pairwise distance connecting each group for each clusters.
>>> (b) to be able to use the "summary dendrogram" converted as a new
>>> distance matrix as described in (a) for further analysis e.g. NMDS etc.
>>>
>>> Any help especially in the form of R code will be highly appreciated.
>>> Thanks and kind regards
>>> Barnabas
>>>
>>>  \-/
>>>    /\
>>>   /--|
>>>  /---/ Barnabas Daru
>>>  |--/   PhD Candidate,
>>>  \-/    African Centre for DNA Barcoding,
>>>  /\      University of Johannesburg,
>>> /--\   PO Box 524, Auckland Park, 2006,
>>> |---\  Johannesburg, South Africa.
>>>  \---\ Lab: +27 11 559 3477
>>>   \--|  Mobile: +277 3818 9583
>>>    \-/  My homepage
>>>    /\
>>>   /--\
>>>
>>> #…if you can think it, you can do it.
>>>
>>>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>>
>>>
>>> _______________________________________________
>>> R-sig-ecology mailing list
>>> R-sig-ecology at r-project.org <mailto:R-sig-ecology at r-project.org>
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>>>
>>
>



More information about the R-sig-ecology mailing list