[R] Advice on exploration of sub-clusters in hierarchical dendrogram

ilai keren at math.montana.edu
Thu Feb 23 18:48:24 CET 2012


See inline

On Thu, Feb 23, 2012 at 8:54 AM, kosmo7 <dnicolgr at hotmail.com> wrote:
> Dear R user,

> In other words, I am trying to obtain/read the sub-clusters of a specific
> cluster in the dendrogram, by isolating a specific node and exploring
> locally its lower hierarchy.

To explore or "zoom in" on elements of z you had the first step right:
create x<-as.dendrogram(z) but then you didn't use x anymore (except
for the plot which could have been done on z). Maybe you wanted:

> df=read.table('mydata.txt', head=T, row.names=1) #read file with distance
> matrix
> d=as.dist(df) #format table as distance matrix
> z<-hclust(d,method="complete", members=NULL)
> x<-as.dendrogram(z)
> plot(x, xlab="mydata complete-LINKAGE", ylim=c(0,4)) #visualization of the
> dendrogram

>From this point

clusters<-cut(x, h=1.6) #obtain clusters at cutoff height=1.6

# clusters is now (after cut x not cutree z) a list of two components:
upper and lower. Each is in itself a list of dendrograms: the
structure above 1.6, and the local clusters below:

plot(clusters$upper)  # the structure above 1.6
plot(clusters$lower[[1]])  # cluster 1

# To print the details of cluster 1 (this output maybe very long
depending on how many members):

str(clusters$lower[[1]])

To extract specific details from the list and automate for all or some
of the clusters ?dendrapply is your friend.

I'm assuming your attempts at reclustering locally later in your post
are no longer necessary, unless I'm missing something on what exactly
you are trying to do.

Hope this helps

Elai



> ord<-cmdscale(d, k=2) #Multidimensional scaling of the data down to 2
> dimensions
> clusplot(ord,clusters, color=TRUE, shade=TRUE,labels=4, lines=0)
> #visualization of the clusters in 2D map
> var1<-var(clusters==1) #variance of cluster 1
>
> #extract cluster memberships:
> clids = as.data.frame(clusters)
> names(clids) = c("id")
> clids$cdr = row.names(clids)
> row.names(clids) = c(1:dim(clids)[1])
> clstructure = lapply(unique(clids$id), function(x){clids[clids$id ==
> x,'cdr']})
>
> clstructure[[1]] #get memberships of cluster 1
>
>
>
> >From this point, eventually, I could recreate a distance matrix with only
> the members of a specific cluster and then re-apply hierarchical clustering
> and start all over again.
> But this would take me ages to perform individually for hundred of clusters.
> So, I was hoping if anyone could point me to a direction as to how to take
> advantage of the initial dendrogram and focus on specific clusters from
> which to derive the sub-clusters at a new given cutoff height.
>
> I recently found in this page
> http://manuals.bioinformatics.ucr.edu/home/R_BioCondManual
> http://manuals.bioinformatics.ucr.edu/home/R_BioCondManual
>
> the following code:
> clid <- c(1,2)
> ysub <- y[names(mycl[mycl%in%clid]),]
> hrsub <- hclust(as.dist(1-cor(t(ysub), method="pearson")),
> method="complete") # Select sub-cluster number (here: clid=c(1,2)) and
> generate corresponding dendrogram.
>
> Even with this given example I am afraid I can't work my way around.
> So I guess in my case I could grab all the members of a specific cluster
> using my existing code and try to reformat the distance matrix in one that
> only contains the distances of those members:
> cluster1members<-clstructure[[1]]
>
> Then I need to reformat the distance matrix into a new one, say d1, which I
> can feed to a new -local- hierarchical clustering:
> hrsub<-hclust(d1, method="complete")
>
> Any ideas on how I can obtain a new distance matrix with just the distances
> of the members in that clusters, with names contained in vector
> "cluster1members" ?
>
> Apologies if this seems trivial, but I really can't find the correct
> functions to use for this task.
> Thank you very much in advance - as I am really a novice with R, small
> chunks of code as example would be of great help.
>
> Take care all -
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Advice-on-exploration-of-sub-clusters-in-hierarchical-dendrogram-tp4414277p4414277.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list