# [BioC] Odd contrast; does it make statistical sense?

Gordon K Smyth smyth at wehi.EDU.AU
Sun Jan 26 03:15:26 CET 2014

```Dear Ryan and Aaron,

Given Aaron's reactions to my previous responses, I will make one more
attempt to answer in slightly more detail.

The first thing to appreciate is that every statistical test is an answer
to a particular question.  The contrast test that you mention certainly
makes statistical sense, but this is not the issue.  The issue is
scientific rather than statistical.  Whether or not this test is an
scientific question is.  You have not yet laid this out in sufficient
detail.

Here are some different scientific contexts that might or might not apply

First, you might want to assert that C and D have higher expression than
either A or B.  If you want to claim that, then clearly you must do
individual contrasts C vs A, C vs B, D vs A and D vs B.  There is no
shortcut.  The contrast C+D vs A+B is not sufficient.

Or you might want to assert that the treatments cluster into two big
groups, C and D vs A and B.  Do establish this, you need to show that the
CD vs AB separation is larger compared to CvsD and BvsA.  You could do all
pairwise comparisons, but a slighly more efficient method would be to test
three contrasts B-A, D-C and (C+D)/2-(A+B)/2.  You can make this assertion
if the third contrast is far more significant than the first two.  Even if
B-A and D-C are statistically significant, you could still establish the
claim by showing that the fold changes for (C+D)/2-(A+B)/2 are much larger
than those for B-A or D-C.

Or you might want to assert that a population made up of equal parts C & D
would have different expression to a population made of equal parts of A &
B.  To assert that, you only need to test (C+D)/2-(A+B)/2.

The four groups might arise from two original factors.  Suppose that the
groups A--D correspond to factors are Big = c(1,1,2,2) and Sub =
c(1,2,1,2).  You might want to assert that Big high increases expression
over Big low regardless of the level of Sub.  In that case you need to
test the two contrasts C-A and D-B.  If both are significantly up, then
you can make the assertion.

Or you might want to assert that Big has the same effect on expression
regardless of the Sub baseline.  In that case you need to show that
(C+D)/2-(A+B)/2 is significant but (D-B)-(C-A) is not.

Finally, if you were confident in advance that A and B were not different
and C and D were not different, then you could simply pool the A and B
samples together and the C and D samples together and do a two group test.
This produces a statistically valid test only if there is no systematic
differential expression between A and B or between C and C.  But if you
knew that in advance, why did you classify the samples into four groups in
the first place??

Best wishes
Gordon

>> Date: Wed, 22 Jan 2014 16:17:35 -0800
>> From: "Ryan C. Thompson" <rct at thompsonclan.org>
>> To: bioconductor <Bioconductor at r-project.org>
>> Subject: [BioC] Odd contrast; does it make statistical sense?
>>
>> Hi all,
>>
>> I'm currently using edgeR to test a somewhat odd contrast. Basically, I
>> have multiple groups, and I want to combine them into just 2 big groups
>> and test whether the two big groups have significantly different
>> averages. I'll give a toy example that demonstrates the same concept. In
>> this example, there are 4 groups, A through D, each containing 3
>> samples, and I want to test whether the mean of all samples in A & B is
>> different from the mean of all samples in C & D:
>>
>> group <- rep(LETTERS[1:4], 3)
>> design <- model.matrix(~0+group)
>> colnames(design) <- LETTERS[1:4]
>> cont <- makeContrasts((A+B)/2 - (C+D)/2, levels=design)
>>
>> My worry is that with this contrast, I'm effectively just testing two
>> groups against each other, and by having 4 groups in the design I will be
>> estimating dispersions that are not appropriate for the test that I'm
>> doing, and hence I will overstate my confidence.
>>
>> Or, to put it another way, am I doing something equivalent to testing a
>> main effect in a model where an interaction term is present?
>>
>> Thank you,
>>
>> -Ryan Thompson

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}

```