[BioC] Heatmaps and correlation

john herbert arraystruggles at gmail.com
Thu May 12 00:52:40 CEST 2011


Producing some real heatmaps makes me wonder;
code;

a = data
rowv<- as.dendrogram(hclust(as.dist(1-cor(t(a)))))
colv<- as.dendrogram(hclust(as.dist(1-cor(a))))
# left map using correlation as a distance
heatmap.2(a, scale="row", Rowv=rowv, Colv=colv)

x11()
# map on the right, using the default settings
heatmap.2(a, scale="row")

The attached heatmaps don't look that great? The first correlation map seems
like it does not cluster correctly. A very basic look makes me think the
predominantly yellow/white columns should cluster.

the second, the default, looks a little better.

This data is Delta CT values, not array data.

> sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United
Kingdom.1252    LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C                            LC_TIME=English_United
Kingdom.1252

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods
base

other attached packages:
[1] HTqPCR_1.6.0       limma_3.8.1        RColorBrewer_1.0-2 Biobase_2.12.1
    gplots_2.8.0       caTools_1.11
[7] bitops_1.0-4.1     gdata_2.8.2        gtools_2.6.2

loaded via a namespace (and not attached):
[1] affy_1.30.0           affyio_1.20.0         preprocessCore_1.14.0
tools_2.13.0




On Wed, May 11, 2011 at 11:27 PM, john herbert <arraystruggles at gmail.com>wrote:

> mmmm, possibly.
> Correlation of gene expression = comparing all genes vs. all genes.
>
> 1) genes A and B, both highly up regulated = high pos correlation, e.g. 0.9
> 2) genes C and D, both lowly down regulated = high pos correlation, e.g.
> 0.9
> 3) gene E high up and gene F low down = high neg correlation, e.g. -0.9
> 4) gene G low down and gene H high up = high neg correlation, e.g. -0.9
>
> with 1-cor; genes A, B, C and D cluster together.
> genes E, F, G and H cluster together but a long way off from ABCD.
>
> with 1-abs(cor), all genes cluster together as they produce a extreme
> correlation, either pos or neg
>
> To me, 1-cor makes more biological sense, is that agreeable?
>
> Table.
> cor 1-cor 1-abs(cor)
> 1 0 0
> 0.9 0.1 0.1
> 0.8 0.2 0.2
> 0.7 0.3 0.3
> 0.6 0.4 0.4
> 0.5 0.5 0.5
> 0.4 0.6 0.6
> 0.3 0.7 0.7
> 0.2 0.8 0.8
> 0.1 0.9 0.9
> 0 1 1
> -0.1 1.1 0.9
> -0.2 1.2 0.8
> -0.3 1.3 0.7
> -0.4 1.4 0.6
> -0.5 1.5 0.5
> -0.6 1.6 0.4
> -0.7 1.7 0.3
> -0.8 1.8 0.2
> -0.9 1.9 0.1
> -1 2 0
>
> On Wed, May 11, 2011 at 10:48 PM, James W. MacDonald <
> jmacdon at med.umich.edu> wrote:
>
>>
>>
>> On 5/11/2011 4:54 PM, john herbert wrote:
>>
>>> A biological circumstance? Interesting. Nothing immediately pops to mind.
>>>
>>
>> You see the pattern, but think about the underlying biology. If you use
>> 1-cor, as you note, two genes that are both highly up-regulated or both
>> highly down-regulated will cluster together.
>>
>> But what if the first gene's product has a negative feedback effect on the
>> transcription of the second gene? That implies a relationship that won't be
>> captured if you use 1-cor, but will be captured if you use 1-abs(cor). This
>> is, I believe, what Kevin was getting at.
>>
>> Best,
>>
>> Jim
>>
>>
>>
>>
>>> cor 1-cor 1-abs(cor)
>>> 1 0 0
>>> 0.9 0.1 0.1
>>> 0.8 0.2 0.2
>>> 0.7 0.3 0.3
>>> 0.6 0.4 0.4
>>> 0.5 0.5 0.5
>>> 0.4 0.6 0.6
>>> 0.3 0.7 0.7
>>> 0.2 0.8 0.8
>>> 0.1 0.9 0.9
>>> 0 1 1
>>> -0.1 1.1 0.9
>>> -0.2 1.2 0.8
>>> -0.3 1.3 0.7
>>> -0.4 1.4 0.6
>>> -0.5 1.5 0.5
>>> -0.6 1.6 0.4
>>> -0.7 1.7 0.3
>>> -0.8 1.8 0.2
>>> -0.9 1.9 0.1
>>> -1 2 0
>>>
>>> looking at the numbers; anything that is highly correlated ,whether
>>> positively or negatively, will be close in distance. the heatmaps look
>>> different and obviously the clustering is different.
>>>
>>> Thanks. .
>>>
>>> On Wed, May 11, 2011 at 9:19 PM, Kevin R. Coombes<
>>> kevin.r.coombes at gmail.com
>>>
>>>> wrote:
>>>>
>>>
>>>  This is not really a bioconductor question.... but
>>>>
>>>> You need something that behaves like a "distance".
>>>>
>>>> By the definition of what you mean by 'distance", two things are close
>>>> if
>>>> and only if the distance is near zero.  The bigger (more positive) the
>>>> distance, the further apart things are. And you cannot measure distances
>>>> as
>>>> negative; only non-negative values need apply.
>>>>
>>>> Using 1-cor, you are taking two things to be close if the correlation is
>>>> close to 1.
>>>> And the things that are furthest apart are the ones where the
>>>> correlation
>>>> is close to -1.
>>>>
>>>> As an exercise, you might want to think about circumstances where the
>>>> preferred code would be
>>>>    1 - abs(cor(*))
>>>> instead of
>>>>    1 - cor(*)
>>>>
>>>>
>>>> On 5/11/2011 3:12 PM, john herbert wrote:
>>>>
>>>>  Dear bioconductors,
>>>>>
>>>>>>  From a google search, I found the following code that confuses me a
>>>>>>
>>>>> little.
>>>>> As usual, it is probably something really elementary but reading around
>>>>> does
>>>>> not solve.
>>>>>
>>>>> The code was written by James Mcdonald (
>>>>> http://www.mail-archive.com/r-help@r-project.org/msg61514.html) and is
>>>>> to
>>>>> compute dendograms based on correlation and plot the results on a
>>>>> heatmap
>>>>> as
>>>>> follows;
>>>>>
>>>>> a<- matrix(rnorm(50), ncol=5)
>>>>> rowv<- as.dendrogram(hclust(as.dist(1-cor(t(a)))))
>>>>> colv<- as.dendrogram(hclust(as.dist(1-cor(a))))
>>>>> heatmap.2(a, scale="row", Rowv=rowv, Colv=colv)
>>>>>
>>>>>
>>>>> Why the *1*-cor(a)?
>>>>>
>>>>>
>>>>> Orig.cor Adjusted cor
>>>>> 1       0
>>>>> 0.9     0.1
>>>>> 0.8     0.2
>>>>> 0.7     0.3
>>>>> 0.6     0.4
>>>>> 0.5     0.5
>>>>> 0.4     0.6
>>>>> 0.3     0.7
>>>>> 0.2     0.8
>>>>> 0.1     0.9
>>>>> 0       1
>>>>> -0.1    1.1
>>>>> -0.2    1.2
>>>>> -0.3    1.3
>>>>> -0.4    1.4
>>>>> -0.5    1.5
>>>>> -0.6    1.6
>>>>> -0.7    1.7
>>>>> -0.8    1.8
>>>>> -0.9    1.9
>>>>> -1      2
>>>>>
>>>>>
>>>>> This removes negative numbers? What is the reason for doing this?
>>>>>
>>>>>        [[alternative HTML version deleted]]
>>>>>
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at r-project.org
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> Search the archives:
>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>
>>>>>
>>>>
>>>        [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>> --
>> James W. MacDonald, M.S.
>> Biostatistician
>> Douglas Lab
>> University of Michigan
>> Department of Human Genetics
>> 5912 Buhl
>> 1241 E. Catherine St.
>> Ann Arbor MI 48109-5618
>> 734-615-7826
>> **********************************************************
>> Electronic Mail is not secure, may not be read every day, and should not
>> be used for urgent or sensitive issues
>>
>
>


More information about the Bioconductor mailing list