[BioC] Limma: Contrasts comparing one factor to multiple others

Wed Feb 11 17:18:21 CET 2009

Hi Dan,

Daniel Brewer wrote:
> Hello,
> 
> I am trying to get to the bottom of how limma works and was thinking
> about what the best way is to compare one level of a variable to
> multiple other levels.  For example say we had three sample types A, B
> and C and I want to find genes that are differentially expressed between
> A and the other samples.  I can see that there would be two ways to set
> up the design matrix:
> 1) to have columns "A" and "Other"
> 2) to have columns "A", "B" and "C" and deal with the comparison in a
> contrast
> 
> I would have expected both of these approaches to produce the same
> results but they do not.  Is this correct? And if they are different
> which is the best approach to use and why are they different?

They are different, primarily due to what is in the denominator. The 
calculation of the standard error of the mean will be different, and the 
eBayes estimate might be different as well. But the SEM will likely have 
the largest effect.

What you have to remember is that the t-tests you construct all use the 
SEM as a yardstick to determine if the observed difference between the 
groups is large or not. In the first case you are pooling the B and C 
groups, so they both have to be pretty similar to each other in order to 
get a significant result. In the second case you are using a measure of 
the intra-group correlation for each individual group for the SEM, so 
the B and C groups don't have to be that similar to each other, just 
consistent within the groups.

So for instance if you have something like

Group	Mean	SD
A	5	1
B	12	1.3
C	5	1.1

This might be significant in the second situation, as the intra-group 
variance is low, and the difference between the mean of B and C and the 
A group is pretty large. However, in the first situation the variance of 
the combined B and C group will be really big, so the SEM will be much 
larger than in the second situation, likely resulting in an 
insignificant result.

Best,

Jim

> 
> A test example is posted below:
> 
> Library(limma)
> testexpr <- matrix(runif(100),ncol=10)
> 
> design2 <- matrix(c(rep(1,5),rep(0,5),rep(0,5),rep(1,5)),ncol=2)
>> design2
>       A Other
>  [1,] 1     0
>  [2,] 1     0
>  [3,] 1     0
>  [4,] 1     0
>  [5,] 1     0
>  [6,] 0     1
>  [7,] 0     1
>  [8,] 0     1
>  [9,] 0     1
> [10,] 0     1
> 
> design1 <- model.matrix(~as.factor(c(rep(1,5),2,2,3,3,3)))
>> design1
>    A B C
> 1  1 0 0
> 2  1 0 0
> 3  1 0 0
> 4  1 0 0
> 5  1 0 0
> 6  0 1 0
> 7  0 1 0
> 8  0 0 1
> 9  0 0 1
> 10 0 0 1
> 
> fit1 <- lmFit(testexpr,design=design1)
> contrasts.matrix1 <- makeContrasts("AvsOther"=A-(B+C)/2,levels=design1)
> fit1 <- eBayes(contrasts.fit(fit1,contrasts=contrasts.matrix1))
> 
> fit2 <- lmFit(testexpr,design=design2)
> contrasts.matrix2 <- makeContrasts("AvsOther"=A-Other,levels=design2)
> fit2 <- eBayes(contrasts.fit(fit2,contrasts=contrasts.matrix2))
> 
>> topTable(fit1,coef=1)
>         logFC   AveExpr          t    P.Value adj.P.Val         B
> 4   0.4990143 0.4692092  2.5585742 0.01051024 0.1051024 -2.791453
> 8  -0.3745471 0.6429097 -1.9203987 0.05480755 0.2231148 -4.069512
> 3  -0.3573284 0.4941828 -1.8321141 0.06693443 0.2231148 -4.217641
> 1  -0.3137072 0.4258817 -1.6084570 0.10773513 0.2693378 -4.561710
> 7   0.2588919 0.5142112  1.3274050 0.18437474 0.3687495 -4.930648
> 5  -0.2243146 0.5899142 -1.1501181 0.25009522 0.4167625 -5.127042
> 2   0.2056316 0.4184454  1.0543258 0.29173376 0.4167625 -5.221461
> 9   0.1813196 0.5025485  0.9296718 0.35254105 0.4406763 -5.332042
> 6  -0.1109509 0.4021577 -0.5688735 0.56944202 0.6327134 -5.573792
> 10  0.0722048 0.5218702  0.3702125 0.71122419 0.7112242 -5.657208
> 
>> topTable(fit2,coef=1)
>          logFC   AveExpr          t    P.Value adj.P.Val         B
> 4   0.47838295 0.4692092  2.5633453 0.01036689 0.1036689 -2.781949
> 8  -0.37026456 0.6429097 -1.9840087 0.04725487 0.1692887 -3.961164
> 3  -0.36452956 0.4941828 -1.9532785 0.05078660 0.1692887 -4.015323
> 1  -0.28931647 0.4258817 -1.5502601 0.12107910 0.3026977 -4.647349
> 7   0.25213156 0.5142112  1.3510102 0.17669216 0.3533843 -4.906105
> 5  -0.23271487 0.5899142 -1.2469687 0.21240897 0.3540150 -5.027093
> 2   0.19075798 0.4184454  1.0221488 0.30671047 0.4381578 -5.255440
> 9   0.15219110 0.5025485  0.8154938 0.41478969 0.5184871 -5.425425
> 10  0.08103956 0.5218702  0.4342387 0.66411512 0.6936484 -5.638698
> 6  -0.07351301 0.4021577 -0.3939088 0.69364840 0.6936484 -5.653648
> 
> The resulting fits using the two different approaches are different.
> 
> Many thanks
> 
> Dan
> 

-- 
James W. MacDonald, M.S.
Biostatistician
Hildebrandt Lab
8220D MSRB III
1150 W. Medical Center Drive
Ann Arbor MI 48109-0646
734-936-8662