[BioC] heatmap.2 and makeContrasts

Thu Mar 3 21:58:34 CET 2011

Hi Supriya,

Please don't take things off-list. We hope that people can use the list 
archives to answer questions, and if you take questions off-list it 
subverts that function.

On 3/3/2011 11:55 AM, Supriya Munshaw wrote:
> Hi Jim, Thank you so much for your response. It was very helpful!
> Adding a little bit to question 2,
>
> If I have 2 diseased patients (A and B) and 2 non-diseased patients
> (C and D), I should get the same result for setting up my matrix as
> Disease-NonDisease and (A+B)-(C+D) if there are no inter-group
> differences, which is an important assumption I make when I group
> them. But if I don't get the same result, it means that the
> within-group differences exist and cannot be ignored. In this case,
> can I find differences between disease and non disease by setting up
> the constrast as (A-B)-(C-D)? Does this make sense?

Let's set aside the fact that you can't do statistics without 
replication (so your example won't work), and assume you have replicates 
for A-D.

If so, then what you are asking about is usually called an interaction, 
and it is designed to detect exactly the situation you describe. There 
is more than one example of this type of analysis in the Limma User's 
Guide, so you should look there for more information. But long story 
short, yes that makes sense.

>
> I'm new to microarray statistical analysis, so sorry for the dumb
> questions. But thank you for your responses!

There is no crime in ignorance. But there is danger, so it would be in 
your interest to (at the very least) read about linear modeling, 
especially ANOVA, so you have some theoretical understanding of what you 
are doing.

Best,

Jim

>
>
> -----Original Message----- From: James W. MacDonald
> [mailto:jmacdon at med.umich.edu] Sent: Thursday, March 03, 2011 11:33
> AM To: Supriya Munshaw Cc: bioconductor at stat.math.ethz.ch Subject:
> Re: [BioC] heatmap.2 and makeContrasts
>
> Hi Supriya,
>
> On 3/2/2011 10:44 AM, Supriya Munshaw wrote:
>> Hi all, I had 2 questions for you reg. using R and Bioconductor.
>>
>> Question 1: I'm using heatmap.2 to make a heatmap for my top
>> differentially expressed genes. I also create a dendogram for my
>> columns that clusters by sample. However, is there a way to modify
>> these dendograms? For example, if you look at the color coding in
>> the attached heatmap, I have clustered by 2 regions. But if you
>> look closely, there is no reason that the dendogram can't be
>> flipped so that the green sections align i.e. the first blue
>> section from the left can be flipped with the second green section
>> from the left which would keep the same information but provide a
>> better visual representation of the clustering. Does anyone know
>> how I can do this?
>>
>
> I don't think it is easily done. You might be able to hack at the
> hclust() code or output to give what you want, but it won't be via a
> simple argument to hclust().
>
>
>> Question 2:
>>
>> My phenotype data file looks like this
>>
>> Patient
>>
>> Disease State
>>
>> Tissue
>>
>> A
>>
>> D
>>
>> T1
>>
>> A
>>
>> D
>>
>> T2
>>
>> B
>>
>> D
>>
>> T1
>>
>> B
>>
>> D
>>
>> T2
>>
>> C
>>
>> N
>>
>> T1
>>
>> C
>>
>> N
>>
>> T2
>>
>> D
>>
>> N
>>
>> T1
>>
>> D
>>
>> N
>>
>> T2
>>
>>
>> So the first comparison I want to make is between disease and non
>> disease in all tissues. I can do that in 2 ways:
>>
>> Option 1: desMat<- model.matrix(~0+ DiseaseState)
>> colnames(desMat)<- levels(DiseaseState) contMat<-
>> makeContrasts(D-N, levels= colnames(desMat)) # I'm assuming this
>> groups all disease states in one group and all non disease states
>> in another, without regard to patient, treating each sample
>> independently, which is fine.
>>
>> Option 2: Combine<-factor(paste(DiseaseState,Tissue,sep=".")   #So
>> now my states are D.T1, D.T2, N.T1, N.T2 desMat<- model.matrix(~0+
>> Combine) colnames(desMat)<- levels(Combine) contMat<-
>> makeContrasts(((D.T1+D.T2)/2)- ((N.T1+N.T2)/2), levels=
>> colnames(desMat))
>>
>> Shouldn't option 1 and 2 give me the same answer? In my case, it
>> does not and I'm not sure I understand why.
>
> No it should not. You are asking two subtly different questions in
> each case. In option 1 you are ignoring any differences between the
> tissues and asking if there is a difference between disease states.
> In option 2 you are accounting for the tissue differences and then
> asking if there is a difference between the disease states.
>
> This comes from how the denominator of the t-statistic is
> constructed. Note that in simple terms the denominator is an average
> of the variability within groups being compared. In option 1, you are
> computing the variability within the diseased group and normal group
> separately and then averaging them. In option 2 you are computing
> variability within the D.T1, D.T2, N.T1, N.T2 groups separately and
> then averaging.
>
> So if the tissues are quite different in expression, but are
> consistent within each disease state/tissue type, then you will tend
> to get significance in option2 but not option 1. As an example:
>
> D.T1 = c(4.5,4.3,4.7,4.2) D.T2 = c(6.4,5.8,6.0,5.8) N.T1 =
> c(6.5,6.3,6.1,6.6) N.T2 = c(7.3,7.2,7.0,7.5)
>
> Here you can see that the within-group variability is very small, but
> if you pool the diseased and normal samples, the variability will
> increase quite a bit, and may well no longer be significant.
>
> Best,
>
> Jim
>
>
>
>
>>
>> I would really appreciate any help. Thank you!
>>
>>
>>
>> _______________________________________________ Bioconductor
>> mailing list Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
>> archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues