[BioC] Limma: Contrasts comparing one factor to multiple others
James W. MacDonald
jmacdon at med.umich.edu
Wed Feb 11 17:18:21 CET 2009
Hi Dan,
Daniel Brewer wrote:
> Hello,
>
> I am trying to get to the bottom of how limma works and was thinking
> about what the best way is to compare one level of a variable to
> multiple other levels. For example say we had three sample types A, B
> and C and I want to find genes that are differentially expressed between
> A and the other samples. I can see that there would be two ways to set
> up the design matrix:
> 1) to have columns "A" and "Other"
> 2) to have columns "A", "B" and "C" and deal with the comparison in a
> contrast
>
> I would have expected both of these approaches to produce the same
> results but they do not. Is this correct? And if they are different
> which is the best approach to use and why are they different?
They are different, primarily due to what is in the denominator. The
calculation of the standard error of the mean will be different, and the
eBayes estimate might be different as well. But the SEM will likely have
the largest effect.
What you have to remember is that the t-tests you construct all use the
SEM as a yardstick to determine if the observed difference between the
groups is large or not. In the first case you are pooling the B and C
groups, so they both have to be pretty similar to each other in order to
get a significant result. In the second case you are using a measure of
the intra-group correlation for each individual group for the SEM, so
the B and C groups don't have to be that similar to each other, just
consistent within the groups.
So for instance if you have something like
Group Mean SD
A 5 1
B 12 1.3
C 5 1.1
This might be significant in the second situation, as the intra-group
variance is low, and the difference between the mean of B and C and the
A group is pretty large. However, in the first situation the variance of
the combined B and C group will be really big, so the SEM will be much
larger than in the second situation, likely resulting in an
insignificant result.
Best,
Jim
>
> A test example is posted below:
>
> Library(limma)
> testexpr <- matrix(runif(100),ncol=10)
>
> design2 <- matrix(c(rep(1,5),rep(0,5),rep(0,5),rep(1,5)),ncol=2)
>> design2
> A Other
> [1,] 1 0
> [2,] 1 0
> [3,] 1 0
> [4,] 1 0
> [5,] 1 0
> [6,] 0 1
> [7,] 0 1
> [8,] 0 1
> [9,] 0 1
> [10,] 0 1
>
> design1 <- model.matrix(~as.factor(c(rep(1,5),2,2,3,3,3)))
>> design1
> A B C
> 1 1 0 0
> 2 1 0 0
> 3 1 0 0
> 4 1 0 0
> 5 1 0 0
> 6 0 1 0
> 7 0 1 0
> 8 0 0 1
> 9 0 0 1
> 10 0 0 1
>
> fit1 <- lmFit(testexpr,design=design1)
> contrasts.matrix1 <- makeContrasts("AvsOther"=A-(B+C)/2,levels=design1)
> fit1 <- eBayes(contrasts.fit(fit1,contrasts=contrasts.matrix1))
>
> fit2 <- lmFit(testexpr,design=design2)
> contrasts.matrix2 <- makeContrasts("AvsOther"=A-Other,levels=design2)
> fit2 <- eBayes(contrasts.fit(fit2,contrasts=contrasts.matrix2))
>
>> topTable(fit1,coef=1)
> logFC AveExpr t P.Value adj.P.Val B
> 4 0.4990143 0.4692092 2.5585742 0.01051024 0.1051024 -2.791453
> 8 -0.3745471 0.6429097 -1.9203987 0.05480755 0.2231148 -4.069512
> 3 -0.3573284 0.4941828 -1.8321141 0.06693443 0.2231148 -4.217641
> 1 -0.3137072 0.4258817 -1.6084570 0.10773513 0.2693378 -4.561710
> 7 0.2588919 0.5142112 1.3274050 0.18437474 0.3687495 -4.930648
> 5 -0.2243146 0.5899142 -1.1501181 0.25009522 0.4167625 -5.127042
> 2 0.2056316 0.4184454 1.0543258 0.29173376 0.4167625 -5.221461
> 9 0.1813196 0.5025485 0.9296718 0.35254105 0.4406763 -5.332042
> 6 -0.1109509 0.4021577 -0.5688735 0.56944202 0.6327134 -5.573792
> 10 0.0722048 0.5218702 0.3702125 0.71122419 0.7112242 -5.657208
>
>> topTable(fit2,coef=1)
> logFC AveExpr t P.Value adj.P.Val B
> 4 0.47838295 0.4692092 2.5633453 0.01036689 0.1036689 -2.781949
> 8 -0.37026456 0.6429097 -1.9840087 0.04725487 0.1692887 -3.961164
> 3 -0.36452956 0.4941828 -1.9532785 0.05078660 0.1692887 -4.015323
> 1 -0.28931647 0.4258817 -1.5502601 0.12107910 0.3026977 -4.647349
> 7 0.25213156 0.5142112 1.3510102 0.17669216 0.3533843 -4.906105
> 5 -0.23271487 0.5899142 -1.2469687 0.21240897 0.3540150 -5.027093
> 2 0.19075798 0.4184454 1.0221488 0.30671047 0.4381578 -5.255440
> 9 0.15219110 0.5025485 0.8154938 0.41478969 0.5184871 -5.425425
> 10 0.08103956 0.5218702 0.4342387 0.66411512 0.6936484 -5.638698
> 6 -0.07351301 0.4021577 -0.3939088 0.69364840 0.6936484 -5.653648
>
> The resulting fits using the two different approaches are different.
>
> Many thanks
>
> Dan
>
--
James W. MacDonald, M.S.
Biostatistician
Hildebrandt Lab
8220D MSRB III
1150 W. Medical Center Drive
Ann Arbor MI 48109-0646
734-936-8662
More information about the Bioconductor
mailing list