[BioC] Odd contrast; does it make statistical sense?

Gordon K Smyth smyth at wehi.EDU.AU
Sun Jan 26 03:15:26 CET 2014

Dear Ryan and Aaron,

Given Aaron's reactions to my previous responses, I will make one more 
attempt to answer in slightly more detail.

The first thing to appreciate is that every statistical test is an answer 
to a particular question.  The contrast test that you mention certainly 
makes statistical sense, but this is not the issue.  The issue is 
scientific rather than statistical.  Whether or not this test is an 
appropriate answer to your scientific question depends on what your 
scientific question is.  You have not yet laid this out in sufficient 

Here are some different scientific contexts that might or might not apply 
in your situation.

First, you might want to assert that C and D have higher expression than 
either A or B.  If you want to claim that, then clearly you must do 
individual contrasts C vs A, C vs B, D vs A and D vs B.  There is no 
shortcut.  The contrast C+D vs A+B is not sufficient.

Or you might want to assert that the treatments cluster into two big 
groups, C and D vs A and B.  Do establish this, you need to show that the 
CD vs AB separation is larger compared to CvsD and BvsA.  You could do all 
pairwise comparisons, but a slighly more efficient method would be to test 
three contrasts B-A, D-C and (C+D)/2-(A+B)/2.  You can make this assertion 
if the third contrast is far more significant than the first two.  Even if 
B-A and D-C are statistically significant, you could still establish the 
claim by showing that the fold changes for (C+D)/2-(A+B)/2 are much larger 
than those for B-A or D-C.

Or you might want to assert that a population made up of equal parts C & D 
would have different expression to a population made of equal parts of A & 
B.  To assert that, you only need to test (C+D)/2-(A+B)/2.

The four groups might arise from two original factors.  Suppose that the 
groups A--D correspond to factors are Big = c(1,1,2,2) and Sub = 
c(1,2,1,2).  You might want to assert that Big high increases expression 
over Big low regardless of the level of Sub.  In that case you need to 
test the two contrasts C-A and D-B.  If both are significantly up, then 
you can make the assertion.

Or you might want to assert that Big has the same effect on expression 
regardless of the Sub baseline.  In that case you need to show that 
(C+D)/2-(A+B)/2 is significant but (D-B)-(C-A) is not.

Finally, if you were confident in advance that A and B were not different 
and C and D were not different, then you could simply pool the A and B 
samples together and the C and D samples together and do a two group test. 
This produces a statistically valid test only if there is no systematic 
differential expression between A and B or between C and C.  But if you 
knew that in advance, why did you classify the samples into four groups in 
the first place??

Best wishes

>> Date: Wed, 22 Jan 2014 16:17:35 -0800
>> From: "Ryan C. Thompson" <rct at thompsonclan.org>
>> To: bioconductor <Bioconductor at r-project.org>
>> Subject: [BioC] Odd contrast; does it make statistical sense?
>> Hi all,
>> I'm currently using edgeR to test a somewhat odd contrast. Basically, I
>> have multiple groups, and I want to combine them into just 2 big groups
>> and test whether the two big groups have significantly different
>> averages. I'll give a toy example that demonstrates the same concept. In
>> this example, there are 4 groups, A through D, each containing 3
>> samples, and I want to test whether the mean of all samples in A & B is
>> different from the mean of all samples in C & D:
>> group <- rep(LETTERS[1:4], 3)
>> design <- model.matrix(~0+group)
>> colnames(design) <- LETTERS[1:4]
>> cont <- makeContrasts((A+B)/2 - (C+D)/2, levels=design)
>> My worry is that with this contrast, I'm effectively just testing two 
>> groups against each other, and by having 4 groups in the design I will be 
>> estimating dispersions that are not appropriate for the test that I'm 
>> doing, and hence I will overstate my confidence.
>> Or, to put it another way, am I doing something equivalent to testing a 
>> main effect in a model where an interaction term is present?
>> Thank you,
>> -Ryan Thompson

The information in this email is confidential and intend...{{dropped:4}}

More information about the Bioconductor mailing list