[BioC] Display scale on hclust heatmap
Warnes, Gregory R
gregory_r_warnes at groton.pfizer.com
Tue Dec 2 17:13:44 MET 2003
Hi Anthony,
I'm attaching a revised heatmap function that will be migrating into the
standard R code. If you set 'scale="none"' you will automatically get a
color key. [If scaling is on, different rows/columns have different scales
so a color key doesn't make any sense.] See the text version of the help
page I'm attaching as well for details on how to select better colors and to
control the break points.
-Greg
> -----Original Message-----
> From: Anthony Bosco [mailto:anthonyb at ichr.uwa.edu.au]
> Sent: Tuesday, December 02, 2003 5:34 AM
> To: bioconductor at stat.math.ethz.ch
> Subject: [BioC] Display scale on hclust heatmap
>
>
> Hi.
>
> I have figured out how to hclust and label plot, but I am having
> trouble displaying the legend for the heat colours on the plot, and
> altering the colurs.
>
>
> Can anyone help?
>
>
> Regards
>
>
> Anthony
> --
> ______________________________________________
>
> Anthony Bosco - Cell Biology Research Assistant
>
> Institute for Child Health Research
> (Company Limited by Guarantee ACN 009 278 755)
> Subiaco, Western Australia, 6008
>
> Ph 61 8 9489 , Fax 61 8 9489 7700
> email anthonyb at ichr.uwa.edu.au
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>
LEGAL NOTICE
Unless expressly stated otherwise, this message is confidential and may be privileged. It is intended for the addressee(s) only. Access to this E-mail by anyone else is unauthorized. If you are not an addressee, any disclosure or copying of the contents of this E-mail or any action taken (or not taken) in reliance on it is unauthorized and may be unlawful. If you are not an addressee, please inform the sender immediately.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: heatmap.2.R
Type: application/octet-stream
Size: 11920 bytes
Desc: not available
Url : https://www.stat.math.ethz.ch/pipermail/bioconductor/attachments/20031202/0f7a1ccc/heatmap.2-0001.obj
-------------- next part --------------
_D_r_a_w _a _H_e_a_t _M_a_p
_D_e_s_c_r_i_p_t_i_o_n:
A heat map is a false color image (basically 'image(t(x))') with a
dendrogram added to the left side and/or to the top. Typically,
reordering of the rows and columns according to some set of values
(row or column means) within the restrictions imposed by the
dendrogram is carried out.
_U_s_a_g_e:
heatmap <- function (x,
#-- dendrogram control --#
Rowv=NULL,
Colv=if(symm)"Rowv" else NULL,
distfun = dist,
hclustfun = hclust,
dendogram = c("both","row","column","none"),
symm = FALSE,
#-- data scaling --#
scale = c("none","row", "column"),
na.rm=TRUE,
#-- image plot --#
revC = identical(Colv, "Rowv"),
add.expr,
breaks,
col=heat.colors(length(breaks)-1),
#-- block separation --#
colsep,
rowsep,
sepcolor="white",
#-- cell labeling --#
cellnote,
notecex=1.0,
notecol="cyan",
#-- level trace --#
trace=c("column","row","both","none"),
tracecol="yellow",
hline=median(breaks),
vline=median(breaks),
linecol=tracecol,
#-- Row/Column Labeling --#
margins = c(5, 5),
ColSideColors,
RowSideColors,
cexRow = 0.2 + 1/log10(nr),
cexCol = 0.2 + 1/log10(nc),
labRow = NULL,
labCol = NULL,
#-- color key + density info --#
key = TRUE,
density.info=c("histogram","density","none"),
denscol="yellow",
#-- plot labels --#
main = NULL,
xlab = NULL,
ylab = NULL,
#-- extras --#
...
)
_A_r_g_u_m_e_n_t_s:
x: numeric matrix of the values to be plotted.
Rowv: determines if and how the _row_ dendrogram should be
reordered. Either a 'dendrogram' or a vector of values used
to reorder the row dendrogram or 'FALSE' to suppress
reordering or by default, 'NULL', see _Details_ below.
Colv: determines if and how the _column_ dendrogram should be
reordered. Has the options as the 'Rowv' argument above and
_additionally_ when 'x' is a square matrix, 'Colv = "Rowv"'
means that columns should be treated identically to the rows.
distfun: function used to compute the distance (dissimilarity) between
both rows and columns. Defaults to 'dist'.
hclustfun: function used to compute the hierarchical clustering when
'Rowv' or 'Colv' are not dendrograms. Defaults to 'hclust'.
dendogram: character string indicating whether to draw 'none', 'row',
'column' or 'both' dendrograms. Defaults to 'both'.
symm: logical indicating if 'x' should be treated *symm*etrically;
can only be true when 'x' is a square matrix.
scale: character indicating if the values should be centered and
scaled in either the row direction or the column direction,
or none. The default is '"row"' if 'symm' false, and
'"none"' otherwise.
na.rm: logical indicating whether 'NA''s should be removed.
revC: logical indicating if the column order should be 'rev'ersed
for plotting, such that e.g., for the symmetric case, the
symmetry axis is as usual.
add.expr: expression that will be evaluated after the call to 'image'.
Can be used to add components to the plot.
breaks: (optional) Either a numeric vector indicating the splitting
points for binning 'x' into colors, or a integer number of
break points to be used, in which case the break points will
be spaced equally between 'min(x)' and 'max(x)'.
col: colors used for the image. Defaults to heat colors
('heat.colors').
colsep,rowsep,sepcolor: (optional) vector of integers indicating which
columns or rows should be separated from the preceding
columns or rows by a narrow space of color 'sepcolor'.
cellnote: (optional) matrix of character strings which will be placed
within each color cell, e.g. p-value symbols.
notecex: (optional) numeric scaling factor for 'cellnote' items.
notecol: (optional) character string specifying the color for
'cellnote' text. Defaults to "green".
trace: character string indicating whether a solid "trace" line
should be drawn across 'row's or down 'column's, 'both' or
'none'. The distance of the line from the center of each
color-cell is proportional to the size of the measurement.
Defaults to 'column'.
tracecol: character string giving the color for "trace" line. Defaults
to "cyan".
hline,vline,linecol: Vector of values within cells where a horizontal
or vertical dotted line should be drawn. The color of the
line is controlled by 'linecol'. Horizontal lines are only
plotted if 'trace' is 'row' or 'both'. Vertical lines are
only drawn if 'trace' 'column' or 'both'. 'hline' and
'vline' default to the median of the breaks, 'linecol'
defaults to the value of 'tracecol'.
margins: numeric vector of length 2 containing the margins (see
'par(mar= *)') for column and row names, respectively.
ColSideColors: (optional) character vector of length 'ncol(x)'
containing the color names for a horizontal side bar that may
be used to annotate the columns of 'x'.
RowSideColors: (optional) character vector of length 'nrow(x)'
containing the color names for a vertical side bar that may
be used to annotate the rows of 'x'.
cexRow, cexCol: positive numbers, used as 'cex.axis' in for the row or
column axis labeling. The defaults currently only use number
of rows or columns, respectively.
labRow, labCol: character vectors with row and column labels to use;
these default to 'rownames(x)' or 'colnames(x)',
respectively.
key: logical indicating whether a color-key should be shown.
density.info: character string indicating whether to superimpose a
'histogram', a 'density' plot, or no plot ('none') on the
color-key.
denscol: character string giving the color for the density display
specified by 'density.info', defaults to the same value as
'tracecol'.
main, xlab, ylab: main, x- and y-axis titles; defaults to none.
...: additional arguments passed on to 'image'
_D_e_t_a_i_l_s:
If either 'Rowv' or 'Colv' are dendrograms they are honored (and
not reordered). Otherwise, dendrograms are computed as 'dd <-
as.dendrogram(hclustfun(distfun(X)))' where 'X' is either 'x' or
't(x)'.
If either is a vector (of "weights") then the appropriate
dendrogram is reordered according to the supplied values subject
to the constraints imposed by the dendrogram, by 'reorder(dd,
Rowv)', in the row case. If either is missing, as by default, then
the ordering of the corresponding dendrogram is by the mean value
of the rows/columns, i.e., in the case of rows, 'Rowv <-
rowMeans(x, na.rm=na.rm)'. If either is 'NULL', _no reordering_
will be done for the corresponding side.
If 'scale = "row"' the rows are scaled to have mean zero and
standard deviation one. There is some empirical evidence from
genomic plotting that this is useful.
The default colors range from red to white ('heat.colors') and are
not pretty. Consider using enhancements such as the
'RColorBrewer' package, <URL:
http://cran.r-project.org/src/contrib/PACKAGES.html#RColorBrewer>
to select better colors.
_V_a_l_u_e:
Invisibly, a list with components
rowInd: *r*ow index permutation vector as returned by
'order.dendrogram'.
colInd: *c*olumn index permutation vector.
_N_o_t_e:
The original rows and columns are reordered _in any case_ to match
the dendrogram, e.g., the rows by 'order.dendrogram(Rowv)' where
'Rowv' is the (possibly 'reorder()'ed) row dendrogram.
'heatmap.2()' uses 'layout' and draws the 'image' in the lower
right corner of a 2x2 layout. Consequentially, it can *not* be
used in a multi column/row layout, i.e., when 'par(mfrow= *)' or
'(mfcol= *)' has been called.
_A_u_t_h_o_r(_s):
Andy Liaw, original; R. Gentleman, M. Maechler, W. Huber, G.
Warnes, revisions.
_S_e_e _A_l_s_o:
'image', 'hclust'
_E_x_a_m_p_l_e_s:
data(mtcars)
x <- as.matrix(mtcars)
rc <- rainbow(nrow(x), start=0, end=.3)
cc <- rainbow(ncol(x), start=0, end=.3)
hv <- heatmap(x, col = cm.colors(256), scale="column",
RowSideColors = rc, ColSideColors = cc, margin=c(5,10),
xlab = "specification variables", ylab= "Car Models",
main = "heatmap(<Mtcars data>, ..., scale = \"column\")",
tracecol="green")
str(hv) # the two re-ordering index vectors
data(attitude)
round(Ca <- cor(attitude), 2)
symnum(Ca) # simple graphic
# with reorder
heatmap(Ca, symm = TRUE, margin=c(6,6), trace="none" )
# without reorder
heatmap(Ca, Rowv=FALSE, symm = TRUE, margin=c(6,6), trace="none" )
## For variable clustering, rather use distance based on cor():
data(USJudgeRatings)
symnum( cU <- cor(USJudgeRatings) )
hU <- heatmap(cU, Rowv = FALSE, symm = TRUE, col = topo.colors(16),
distfun = function(c) as.dist(1 - c), trace="none")
## The Correlation matrix with same reordering:
hM <- format(round(cU[hU[[1]], hU[[2]]],2))
hM
# now with the correlation matrix on the plot itself
heatmap(cU, Rowv = FALSE, symm = TRUE, col = rev(heat.colors(16)),
distfun = function(c) as.dist(1 - c), trace="none",
cellnote=hM)
## genechip data examples
## Don't run:
library(affy)
data(SpikeIn)
pms <- SpikeIn at pm
# just the data, scaled across rows
heatmap(pms, col=rev(heat.colors(16)), main="SpikeIn at pm",
xlab="Relative Concentration", ylab="Probeset",
scale="row")
# fold change vs "12.50" sample
data <- pms / pms[,"12.50"]
data <- ifelse(data>1,data,-1/data)
heatmap(data, breaks=8, col=redgreen, tracecol="blue",
main="SpikeIn at pm Fold Changes\nrelative to 12.50 sample",
xlab="Relative Concentration", ylab="Probeset")
## End Don't run
More information about the Bioconductor
mailing list