[BioC] Extracting dendogram information from Heatmaps

'Thomas Girke' thomas.girke at ucr.edu
Thu Dec 13 19:21:25 CET 2007


The best way to answer these questions is to subset your data set to
to a test matrix with only a few rows. This way you can see the labels 
in the plot and things become intuitive.  
For example: 
	myma <- MA$M[fitp,]
	myma <- myma[1:20,]

Row names can always be assigned by you with 
	rownames(myma) <- mynames

If your data set has a label column, then it would be
	rownames(myma) <- myname$label

To be sure your data set is a matrix, you do:
	myma <- as.matrix(myma)

Continue with hclust ...

Thomas

On Thu 12/13/07 12:50, alison waller wrote:
> Thanks everyone, these are great suggestions.
> 
> I had trouble with the identify, as the plot moved when I clicked the mouse
> and I got error messages.  
> 
> The cutree worked well - however, I see a matrix which has values
> corresponding to clusters, but is cluster one the leftmost or rightmost
> cluster? Ie. how are they ordered.
> 
> The $labels method seems the best but my matrix doesn't seem to have labels.
> I made my matrix from the M values from an MAList, is there a way to carry
> through the gene names?
> 
> Myclust<-hclust(dist(MA$M[fitp,])
> Myclust$labels gives NULL
> 
> Thanks again,
> 
> alison
> 
> 
> -----Original Message-----
> From: Thomas Girke [mailto:thomas.girke at ucr.edu] 
> Sent: Thursday, December 13, 2007 12:00 PM
> To: James W. MacDonald
> Cc: alison waller; bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] Extracting dendogram information from Heatmaps
> 
> Alison,
> 
> In addition to James' suggestions, you may want to get familiar how to
> access the 
> different data components of the resulting hclust object (e.g. labels,
> order) and 
> the cutree() function. If you can't read the labels in the plots, then you
> can 
> always extract them in clean text in the corresponding tree order (see
> below: 
> hr$labels[hr$order]) from the hclust objects.
> 
> Here is a short example to illustrate a possible hclust-heatmap/heatmap.2
> routine:
> 
> # Generate a sample matrix
> y <- matrix(rnorm(50), 10, 5, dimnames=list(paste("g", 1:10, sep=""),
> paste("t", 1:5, sep=""))) 
> 
> # Cluster rows and columns by correlation distance
> hr <- hclust(as.dist(1-cor(t(y), method="pearson"))) 
> hc <- hclust(as.dist(1-cor(y, method="spearman"))) 
> 
> # Obtain discrete clusters with cutree
> mycl <- cutree(hr, h=max(hr$height)/1.5)
> 
> # Prints the row labels in the order they appear in the tree.
> hr$labels[hr$order] .
> # Prints the row labels and cluster assignments
> sort(mycl) 
> 
> # Some color selection steps
> mycolhc <- sample(rainbow(256))
> mycolhc <- mycolhc[as.vector(mycl)]
> 
> # Plot the data matrix as heatmap and the cluster results as dendrograms
> with heatmap or heatmap.2
> # and show the cutree() results in color bar.
> heatmap(y, Rowv=as.dendrogram(hr), Colv=as.dendrogram(hc), scale="row",
> RowSideColors=mycolhc) 
> 
> library("gplots") 
> heatmap.2(y, Rowv=as.dendrogram(hr), Colv=as.dendrogram(hc),
> col=redgreen(75), scale="row", 
> ColSideColors=heat.colors(length(hc$labels)), RowSideColors=mycolhc,
> trace="none", key=T, cellnote=round(t(scale(t(y))),1))
> 
> 
> Best, 
> Thomas
> 
> On Thu 12/13/07 09:58, James W. MacDonald wrote:
> > Hi Alison,
> > 
> > alison waller wrote:
> > > Hello Everyone,
> > > 
> > >  
> > > 
> > > I've been using heatmap and heatmap.2 to draw heatmaps for my
> experiments.  
> > > 
> > >  
> > > 
> > > I have a heatmap of the M values of 6 arrays for the spots with pvalues
> were
> > > <0.005 (from eBayes).
> > > 
> > > However, I would like to see which spots it has grouped together in the
> row
> > > dendogram.  Is there a way I can extract the information about the spots
> > > that are clustered together.  I cannot read the row names, and even if I
> > > could I was hoping there would be some way to list the clusters and save
> it
> > > to a file.
> > 
> > There are two ways to do this that I know of. And either can be a pain, 
> > depending on how big the dendrogram is.
> > 
> > Both methods require you to construct your dendrogram first. You can 
> > then choose the clusters with the mouse. This might be more difficult if 
> > you have some gigantic dendrogram and have ingested too much coffee ;-D.
> > 
> > Normally, one would simply do
> > 
> > heatmap(mymatrix, otherargs)
> > 
> > and accept the default clustering method. However, you can always 
> > pre-construct the dendrograms and then feed those to heatmap().
> > 
> > Rowv <- as.dendrogram(hclust(dist(mymatrix)))
> > Colv <- as.dendrogram(hclust(dist(t(mymatrix))))
> > 
> > heatmap(mymatrix, Rowv=Rowv, Colv=Colv, otherargs)
> > 
> > Now if you do something like that, then you can try
> > 
> > plot(Rowv)
> > a.cluster <- identify(Rowv)
> > 
> > and then use your mouse to choose the upper left corner of a rectangle 
> > that encompasses the cluster you are interested in. Here is where the 
> > size of the dendrogram and the amount of coffee comes in. If the 
> > dendrogram is really large then identify() may not be able to figure out 
> > what you are trying to select, or may decide you are choosing the upper 
> > right corner.
> > 
> > You can choose as many clusters as you want, and they will be in the 
> > list a.cluster, in the order you selected.
> > 
> > A more programmatic method is to use rect.hclust() and either choose the 
> > height at which to make the cuts, or the number of clusters, etc. Again, 
> > depending on the size of your dendrogram, this may work well or it may 
> > be painful.
> > 
> > Best,
> > 
> > Jim
> > 
> > 
> > > 
> > >  
> > > 
> > > Thanks,
> > > 
> > >  
> > > 
> > > Alison  
> > > 
> > >  
> > > 
> > > ******************************************
> > > Alison S. Waller  M.A.Sc.
> > > Doctoral Candidate
> > > awaller at chem-eng.utoronto.ca
> > > 416-978-4222 (lab)
> > > Department of Chemical Engineering
> > > Wallberg Building
> > > 200 College st.
> > > Toronto, ON
> > > M5S 3E5
> > > 
> > >   
> > > 
> > >  
> > > 
> > > 
> > > 	[[alternative HTML version deleted]]
> > > 
> > > _______________________________________________
> > > Bioconductor mailing list
> > > Bioconductor at stat.math.ethz.ch
> > > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > > Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
> > 
> > -- 
> > James W. MacDonald, M.S.
> > Biostatistician
> > Affymetrix and cDNA Microarray Core
> > University of Michigan Cancer Center
> > 1500 E. Medical Center Drive
> > 7410 CCGC
> > Ann Arbor MI 48109
> > 734-647-5623
> > 
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
> > 
> 
> -- 
> Thomas Girke
> Assistant Professor of Bioinformatics
> Director, IIGB Bioinformatic Facility
> Center for Plant Cell Biology (CEPCEB)
> Institute for Integrative Genome Biology (IIGB)
> Department of Botany and Plant Sciences
> 1008 Noel T. Keen Hall
> University of California
> Riverside, CA 92521
> 
> E-mail: thomas.girke at ucr.edu
> Website: http://faculty.ucr.edu/~tgirke
> Ph: 951-827-2469
> Fax: 951-827-4437
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> 

-- 
Thomas Girke
Assistant Professor of Bioinformatics
Director, IIGB Bioinformatic Facility
Center for Plant Cell Biology (CEPCEB)
Institute for Integrative Genome Biology (IIGB)
Department of Botany and Plant Sciences
1008 Noel T. Keen Hall
University of California
Riverside, CA 92521

E-mail: thomas.girke at ucr.edu
Website: http://faculty.ucr.edu/~tgirke
Ph: 951-827-2469
Fax: 951-827-4437



More information about the Bioconductor mailing list