[BioC] Limma: doing multiple paired t-tests in one go...
Gordon Smyth
smyth at wehi.EDU.AU
Sat Dec 1 02:35:38 CET 2007
Dear Philip,
You are seeing problems where really there are none.
The advice from the limma User's Guide guided you correctly. Your
limma session below is perfectly correct. Jim's advice was perfectly
correct and helpful.
The only problem is that you're insisting on treating the simple
paired t-test as a reference method which you believe you have to
reproduce. The aim of linear modelling in limma is to give you the
best possible test, not to reproduce the test statistic you would
have got if you only had half as much data.
If all you want is to do is to reproduce ordinary paired t-tests,
then you have to analyse Diet_1 and Diet_2 separately with separate
linear models. In that case, there's no point in any further
discussion. If you are willing to embrace a more general variant of
the paired t-test which might be more powerful, then read on.
The tests you get from linear modelling will differ from simple
paired t-tests because the residual standard deviation is pooled
across all your arrays. Your first 4 arrays allow you to test
Treatment_Diet_1 vs Control_Diet_1 with 1 residual df. Your second
set of 4 arrays allow you to test Treatment_Diet_2 vs Control_Diet_2,
also with 1 residual df. When you put all 8 arrays into your linear
model, the two residual df are pooled so that you get t-tests for
Treatment_Diet_1 vs Control_Diet_1 and for Treatment_Diet_2 vs
Control_Diet_2 on 2 residual df. This is usually a more powerful approach.
The subtlety to your experiment is that you want to conduct two
disconnected tests. The usual model formula in R are not tuned to
this situation. The parametrisations you've been using will give you
the correct results (because limma will figure it out, and will drop
superfluous coefficients as necessary), but they do tend to obscure
the true situation. Here is a simpler parametrisation which makes the
situation more explicit:
> design <- model.matrix(~0+Sibship)
> design <- cbind(design,TreatmentvsControl1=c(0,1,0,1,0,0,0,0))
> design <- cbind(design,TreatmentvsControl2=c(0,0,0,0,0,1,0,1))
> design
Sibship101 Sibship103 Sibship113 Sibship114 TreatmentvsControl1
TreatmentvsControl2
1 0 0 1 0 0
0
2 0 0 1 0 1
0
3 0 0 0 1 0
0
4 0 0 0 1 1
0
5 1 0 0 0 0
0
6 1 0 0 0 0
1
7 0 1 0 0 0
0
8 0 1 0 0 0
1
This makes it explicit that you there are only 6 independent
coefficients in your model, not 7 as your previous design matrices
have had. With this design matrix you don't even need to use
contrasts. The last two coefficients already represent the two
comparisons you want to make.
Best wishes
Gordon
>Date: Thu, 29 Nov 2007 16:42:41 +0100
>From: "Groot, Philip de" <philip.degroot at wur.nl>
>Subject: Re: [BioC] Limma: doing multiple paired t-tests in one go...
>To: "James W. MacDonald" <jmacdon at med.umich.edu>
>Cc: bioconductor at stat.math.ethz.ch
>
>Hello Jim,
>
>Thank you for your answer! Unfortunately, it does NOT solve my
>problem. As a matter of fact: I already tried this myself... Let me
>show the problem as follows. First, I reproduce what you emailed me:
>
> > sib
>SibShips
> 113 113 114 114 101 101 103 103
>Levels: 101 103 113 114
> > rep
>[1] "Control_Diet_1" "Treatment_Diet_1"
>"Control_Diet_1" "Treatment_Diet_1"
>[5] "Control_Diet_2" "Treatment_Diet_2"
>"Control_Diet_2" "Treatment_Diet_2"
> > design <- model.matrix(~0+rep+sib)
>Warning message:
>In model.matrix.default(~0 + rep + sib) :
> variable 'rep' converted to a factor
> > design
> repControl_Diet_1 repControl_Diet_2 repTreatment_Diet_1 repTreatment_Diet_2
>1 1 0 0 0
>2 0 0 1 0
>3 1 0 0 0
>4 0 0 1 0
>5 0 1 0 0
>6 0 0 0 1
>7 0 1 0 0
>8 0 0 0 1
> sib103 sib113 sib114
>1 0 1 0
>2 0 1 0
>3 0 0 1
>4 0 0 1
>5 0 0 0
>6 0 0 0
>7 1 0 0
>8 1 0 0
>attr(,"assign")
>[1] 1 1 1 1 2 2 2
>attr(,"contrasts")
>attr(,"contrasts")$rep
>[1] "contr.treatment"
>attr(,"contrasts")$sib
>[1] "contr.treatment"
>
>
>Secondly, I continue with the required Limma calculations and show
>you the result for a single probe:
>
> > fit <- lmFit(x.norm, design)
>Coefficients not estimable: sib114
>
>(Note the error message. I suppose that it can be skipped because I
>am not interested in this factor anyway).
>
> > contrast.matrix <-
> makeContrasts(contrasts=c("repControl_Diet_1-repTreatment_Diet_1",
> "repControl_Diet_2-repTreatment_Diet_2"), levels=design)
> > contrast.matrix
> Contrasts
>Levels repControl_Diet_1-repTreatment_Diet_1
> repControl_Diet_1 1
> repControl_Diet_2 0
> repTreatment_Diet_1 -1
> repTreatment_Diet_2 0
> sib103 0
> sib113 0
> sib114 0
> Contrasts
>Levels repControl_Diet_2-repTreatment_Diet_2
> repControl_Diet_1 0
> repControl_Diet_2 1
> repTreatment_Diet_1 0
> repTreatment_Diet_2 -1
> sib103 0
> sib113 0
> sib114 0
> >
>(makes sense if you ask me)
>
> > fit <- contrasts.fit(fit, contrast.matrix)
> > tstat.ord <- fit$coef / fit$stdev.unscaled / fit$sigma
> > pvalue.ord <- 2 * pt( abs(tstat.ord), df=fit$df.residual, lower.tail=FALSE)
> > tstat.ord[1,]
>repControl_Diet_1-repTreatment_Diet_1 repControl_Diet_2-repTreatment_Diet_2
> -0.570853 -1.724101
> > pvalue.ord[1,]
>repControl_Diet_1-repTreatment_Diet_1 repControl_Diet_2-repTreatment_Diet_2
> 0.6256902 0.2268312
>
>Now let's verify this with a reference method!:
> > y_after <- exprs(x.norm)[1,c(2,4)]
> > y_after
>A96_hA_40_113_1_final.CEL A96_hA_41_114_1_final.CEL
> 7.182141 7.480890
> > y_before <- exprs(x.norm)[1,c(1,3)]
> > y_before
>A96_hA_09_113_1_base.CEL A96_hA_10_114_1_base.CEL
> 7.230783 7.363846
> > t.verify <- t.test(y_before, y_after, paired=T)
> > print(t.verify)
> Paired t-test
>data: y_before and y_after
>t = -0.4128, df = 1, p-value = 0.7507
>alternative hypothesis: true difference in means is not equal to 0
>95 percent confidence interval:
> -1.086823 1.018420
>sample estimates:
>mean of the differences
> -0.03420119
>
>This does not agree! However, when I exclude Diet 2 in the
>ExpressionSet and redo the Limma calculation, it DOES match!!:
> > sib <- pData(x.norm)$sibship[c(1:4)]
> > sib
>SibShips
> 113 113 114 114
>Levels: 101 103 113 114
> > rep <- pData(x.norm)$replicates[c(1:4)]
> > rep
>[1] "Control_Diet_1" "Treatment_Diet_1"
>"Control_Diet_1" "Treatment_Diet_1"
> > design <- model.matrix(~0+rep+sib)
>Warning message:
>In model.matrix.default(~0 + rep + sib) :
> variable 'rep' converted to a factor
> > design
> repControl_Diet_1 repTreatment_Diet_1 sib103 sib113 sib114
>1 1 0 0 1 0
>2 0 1 0 1 0
>3 1 0 0 0 1
>4 0 1 0 0 1
>attr(,"assign")
>[1] 1 1 2 2 2
>attr(,"contrasts")
>attr(,"contrasts")$rep
>[1] "contr.treatment"
>attr(,"contrasts")$sib
>[1] "contr.treatment"
> > fit <- lmFit(x.norm[,c(1:4)], design)
>Coefficients not estimable: sib103 sib114
> > contrast.matrix <-
> makeContrasts(contrasts=c("repControl_Diet_1-repTreatment_Diet_1"),
> levels=design)
> > contrast.matrix
> Contrasts
>Levels repControl_Diet_1-repTreatment_Diet_1
> repControl_Diet_1 1
> repTreatment_Diet_1 -1
> sib103 0
> sib113 0
> sib114 0
> > fit <- contrasts.fit(fit, contrast.matrix)
> > tstat.ord <- fit$coef / fit$stdev.unscaled / fit$sigma
> > pvalue.ord <- 2 * pt( abs(tstat.ord), df=fit$df.residual,
> lower.tail=FALSE)
> > tstat.ord[1,]
>[1] -0.412843
> > pvalue.ord[1,]
>[1] 0.7507451
>
>Which is in agreement with the t.test-calculation I demonstrated
>previously! And this is exactly my problem: As soon as more diets
>are included, I cannot (correctly) fit a model anymore that gives me
>t-values which are in agreement with my reference method. Why? And
>more importantly: how to create a linear model (and/or contrasts
>matrix) that fixes this problem? Any help is highly appreciated!
>
>Kind regards,
>
>Philip
>
>
>
>________________________________
>
>From: James W. MacDonald [mailto:jmacdon at med.umich.edu]
>Sent: Thu 29-11-2007 15:07
>To: Groot, Philip de
>Cc: bioconductor at stat.math.ethz.ch
>Subject: Re: [BioC] Limma: doing multiple paired t-tests in one go...
>
>
>
>Hi Philip,
>
>Does this help?
>
> > sib
>[1] 113 113 114 114 101 101 103 103
>Levels: 101 103 113 114
> > rep
>[1] Control_Diet_1 Treatment_Diet_1
>[3] Control_Diet_1 Treatment_Diet_1
>[5] Control_Diet_2 Treatment_Diet_2
>[7] Control_Diet_2 Treatment_Diet_2
>4 Levels: Control_Diet_1 ... Treatment_Diet_2
> > design <- model.matrix(~0+rep+sib)
> > design
> repControl_Diet_1 repControl_Diet_2
>1 1 0
>2 0 0
>3 1 0
>4 0 0
>5 0 1
>6 0 0
>7 0 1
>8 0 0
> repTreatment_Diet_1 repTreatment_Diet_2 sib103
>1 0 0 0
>2 1 0 0
>3 0 0 0
>4 1 0 0
>5 0 0 0
>6 0 1 0
>7 0 0 1
>8 0 1 1
> sib113 sib114
>1 1 0
>2 1 0
>3 0 1
>4 0 1
>5 0 0
>6 0 0
>7 0 0
>8 0 0
>attr(,"assign")
>[1] 1 1 1 1 2 2 2
>attr(,"contrasts")
>attr(,"contrasts")$rep
>[1] "contr.treatment"
>
>attr(,"contrasts")$sib
>[1] "contr.treatment"
>
>Best,
>
>Jim
>
>
>
>
>Groot, Philip de wrote:
> > Hello All,
> >
> > I encountered a problem that I cannot easily solve, most probably
> because my knowledge of linear models is too restricted. The
> problem is that I want to do a paired t-test using limma, but that
> I want to fit multiple comparisons (using different patients!)
> simultanuously. The reason for this is that all .CEL-files in my
> experiment are fitted and this fit is used for the eBayes() command
> to maximize the advantage of using the eBayes approach.
> >
> > I found in the bioconductor mailing list a somewhat related topic:
> >
> >
> https://stat.ethz.ch/pipermail/bioconductor/2007-February/016123.htm
> l <https://stat.ethz.ch/pipermail/bioconductor/2007-February/016123.html>
> >
> > However, my problem is different. Instead of having multiple
> treatments over the same patients, I have multiple treatments over
> multiple patients (but still can do a paired t-test because before
> and after a single treatment is done on the same person).
> >
> > For simplicity, let's assume that I have 2 diets and 2 patients
> for each diet. My pData(x.norm) looks like this:
> >
> > sample replicates sibship
> > A96_hA_09_113_1_base.CEL 1 Control_Diet_1 113
> > A96_hA_40_113_1_final.CEL 2 Treatment_Diet_1 113
> > A96_hA_10_114_1_base.CEL 3 Control_Diet_1 114
> > A96_hA_41_114_1_final.CEL 4 Treatment_Diet_1 114
> > A96_hA_01_101_2_base.CEL 5 Control_Diet_2 101
> > A96_hA_32_101_2_final.CEL 6 Treatment_Diet_2 101
> > A96_hA_02_103_2_base.CEL 7 Control_Diet_2 103
> > A96_hA_33_103_2_final.CEL 8 Treatment_Diet_2 103
> >
> > My design matrix (for a paired t-test) is calculated as follows
> (from the Limma user guide):
> > Replicates <- factor(pData(x.norm)$replicates)
> > SibShip <- factor(pData(x.norm)$sibship)
> > design <- model.matrix(~SibShip+Replicates)
> >
> > And the design matrix looks like this:
> > (Intercept) SibShip103 SibShip113 SibShip114 ReplicatesControl_Diet_2
> > 1 1 0 1 0 0
> > 2 1 0 1 0 0
> > 3 1 0 0 1 0
> > 4 1 0 0 1 0
> > 5 1 0 0 0 1
> > 6 1 0 0 0 0
> > 7 1 1 0 0 1
> > 8 1 1 0 0 0
> > ReplicatesTreatment_Diet_1 ReplicatesTreatment_Diet_2
> > 1 0 0
> > 2 1 0
> > 3 0 0
> > 4 1 0
> > 5 0 0
> > 6 0 1
> > 7 0 0
> > 8 0 1
> > attr(,"assign")
> > [1] 0 1 1 1 2 2 2
> > attr(,"contrasts")
> > attr(,"contrasts")$SibShip
> > [1] "contr.treatment"
> > attr(,"contrasts")$Replicates
> > [1] "contr.treatment"
> >
> >
> > As you can image, the comparisons I am interested in are
> Control_Diet_1-Treatment_Diet_1 and
> Control_Diet_2-Treatment_Diet_2. I might also be interested in
> Control_Diet_1-Control_Diet_2 and
> Treatment_Diet_1-Treatment_Diet_2, and so forth. My problem is that
> the current design matrix is rather complicated and that multiple
> interaction effects are somehow included, i.e. I cannot get
> individual effects by simply subtracting two factors in the design
> matrix (as I understand it). My question is: how can I create a
> contrast matrix that gives me the comparisons I am interested in? I
> am really looking forward to an answer!
> >
> > Kind Regards,
> >
> > Dr. Philip de Groot
> > Wageningen University
> >
> >
>
>--
>James W. MacDonald, M.S.
>Biostatistician
>Affymetrix and cDNA Microarray Core
>University of Michigan Cancer Center
>1500 E. Medical Center Drive
>7410 CCGC
>Ann Arbor MI 48109
>734-647-5623
More information about the Bioconductor
mailing list