[BioC] Limma: doing multiple paired t-tests in one go...
Groot, Philip de
philip.degroot at wur.nl
Thu Nov 29 16:42:41 CET 2007
Hello Jim,
Thank you for your answer! Unfortunately, it does NOT solve my problem. As a matter of fact: I already tried this myself... Let me show the problem as follows. First, I reproduce what you emailed me:
> sib
SibShips
113 113 114 114 101 101 103 103
Levels: 101 103 113 114
> rep
[1] "Control_Diet_1" "Treatment_Diet_1" "Control_Diet_1" "Treatment_Diet_1"
[5] "Control_Diet_2" "Treatment_Diet_2" "Control_Diet_2" "Treatment_Diet_2"
> design <- model.matrix(~0+rep+sib)
Warning message:
In model.matrix.default(~0 + rep + sib) :
variable 'rep' converted to a factor
> design
repControl_Diet_1 repControl_Diet_2 repTreatment_Diet_1 repTreatment_Diet_2
1 1 0 0 0
2 0 0 1 0
3 1 0 0 0
4 0 0 1 0
5 0 1 0 0
6 0 0 0 1
7 0 1 0 0
8 0 0 0 1
sib103 sib113 sib114
1 0 1 0
2 0 1 0
3 0 0 1
4 0 0 1
5 0 0 0
6 0 0 0
7 1 0 0
8 1 0 0
attr(,"assign")
[1] 1 1 1 1 2 2 2
attr(,"contrasts")
attr(,"contrasts")$rep
[1] "contr.treatment"
attr(,"contrasts")$sib
[1] "contr.treatment"
Secondly, I continue with the required Limma calculations and show you the result for a single probe:
> fit <- lmFit(x.norm, design)
Coefficients not estimable: sib114
(Note the error message. I suppose that it can be skipped because I am not interested in this factor anyway).
> contrast.matrix <- makeContrasts(contrasts=c("repControl_Diet_1-repTreatment_Diet_1", "repControl_Diet_2-repTreatment_Diet_2"), levels=design)
> contrast.matrix
Contrasts
Levels repControl_Diet_1-repTreatment_Diet_1
repControl_Diet_1 1
repControl_Diet_2 0
repTreatment_Diet_1 -1
repTreatment_Diet_2 0
sib103 0
sib113 0
sib114 0
Contrasts
Levels repControl_Diet_2-repTreatment_Diet_2
repControl_Diet_1 0
repControl_Diet_2 1
repTreatment_Diet_1 0
repTreatment_Diet_2 -1
sib103 0
sib113 0
sib114 0
>
(makes sense if you ask me)
> fit <- contrasts.fit(fit, contrast.matrix)
> tstat.ord <- fit$coef / fit$stdev.unscaled / fit$sigma
> pvalue.ord <- 2 * pt( abs(tstat.ord), df=fit$df.residual, lower.tail=FALSE)
> tstat.ord[1,]
repControl_Diet_1-repTreatment_Diet_1 repControl_Diet_2-repTreatment_Diet_2
-0.570853 -1.724101
> pvalue.ord[1,]
repControl_Diet_1-repTreatment_Diet_1 repControl_Diet_2-repTreatment_Diet_2
0.6256902 0.2268312
Now let's verify this with a reference method!:
> y_after <- exprs(x.norm)[1,c(2,4)]
> y_after
A96_hA_40_113_1_final.CEL A96_hA_41_114_1_final.CEL
7.182141 7.480890
> y_before <- exprs(x.norm)[1,c(1,3)]
> y_before
A96_hA_09_113_1_base.CEL A96_hA_10_114_1_base.CEL
7.230783 7.363846
> t.verify <- t.test(y_before, y_after, paired=T)
> print(t.verify)
Paired t-test
data: y_before and y_after
t = -0.4128, df = 1, p-value = 0.7507
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.086823 1.018420
sample estimates:
mean of the differences
-0.03420119
This does not agree! However, when I exclude Diet 2 in the ExpressionSet and redo the Limma calculation, it DOES match!!:
> sib <- pData(x.norm)$sibship[c(1:4)]
> sib
SibShips
113 113 114 114
Levels: 101 103 113 114
> rep <- pData(x.norm)$replicates[c(1:4)]
> rep
[1] "Control_Diet_1" "Treatment_Diet_1" "Control_Diet_1" "Treatment_Diet_1"
> design <- model.matrix(~0+rep+sib)
Warning message:
In model.matrix.default(~0 + rep + sib) :
variable 'rep' converted to a factor
> design
repControl_Diet_1 repTreatment_Diet_1 sib103 sib113 sib114
1 1 0 0 1 0
2 0 1 0 1 0
3 1 0 0 0 1
4 0 1 0 0 1
attr(,"assign")
[1] 1 1 2 2 2
attr(,"contrasts")
attr(,"contrasts")$rep
[1] "contr.treatment"
attr(,"contrasts")$sib
[1] "contr.treatment"
> fit <- lmFit(x.norm[,c(1:4)], design)
Coefficients not estimable: sib103 sib114
> contrast.matrix <- makeContrasts(contrasts=c("repControl_Diet_1-repTreatment_Diet_1"), levels=design)
> contrast.matrix
Contrasts
Levels repControl_Diet_1-repTreatment_Diet_1
repControl_Diet_1 1
repTreatment_Diet_1 -1
sib103 0
sib113 0
sib114 0
> fit <- contrasts.fit(fit, contrast.matrix)
> tstat.ord <- fit$coef / fit$stdev.unscaled / fit$sigma
> pvalue.ord <- 2 * pt( abs(tstat.ord), df=fit$df.residual, lower.tail=FALSE)
> tstat.ord[1,]
[1] -0.412843
> pvalue.ord[1,]
[1] 0.7507451
Which is in agreement with the t.test-calculation I demonstrated previously! And this is exactly my problem: As soon as more diets are included, I cannot (correctly) fit a model anymore that gives me t-values which are in agreement with my reference method. Why? And more importantly: how to create a linear model (and/or contrasts matrix) that fixes this problem? Any help is highly appreciated!
Kind regards,
Philip
________________________________
From: James W. MacDonald [mailto:jmacdon at med.umich.edu]
Sent: Thu 29-11-2007 15:07
To: Groot, Philip de
Cc: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] Limma: doing multiple paired t-tests in one go...
Hi Philip,
Does this help?
> sib
[1] 113 113 114 114 101 101 103 103
Levels: 101 103 113 114
> rep
[1] Control_Diet_1 Treatment_Diet_1
[3] Control_Diet_1 Treatment_Diet_1
[5] Control_Diet_2 Treatment_Diet_2
[7] Control_Diet_2 Treatment_Diet_2
4 Levels: Control_Diet_1 ... Treatment_Diet_2
> design <- model.matrix(~0+rep+sib)
> design
repControl_Diet_1 repControl_Diet_2
1 1 0
2 0 0
3 1 0
4 0 0
5 0 1
6 0 0
7 0 1
8 0 0
repTreatment_Diet_1 repTreatment_Diet_2 sib103
1 0 0 0
2 1 0 0
3 0 0 0
4 1 0 0
5 0 0 0
6 0 1 0
7 0 0 1
8 0 1 1
sib113 sib114
1 1 0
2 1 0
3 0 1
4 0 1
5 0 0
6 0 0
7 0 0
8 0 0
attr(,"assign")
[1] 1 1 1 1 2 2 2
attr(,"contrasts")
attr(,"contrasts")$rep
[1] "contr.treatment"
attr(,"contrasts")$sib
[1] "contr.treatment"
Best,
Jim
Groot, Philip de wrote:
> Hello All,
>
> I encountered a problem that I cannot easily solve, most probably because my knowledge of linear models is too restricted. The problem is that I want to do a paired t-test using limma, but that I want to fit multiple comparisons (using different patients!) simultanuously. The reason for this is that all .CEL-files in my experiment are fitted and this fit is used for the eBayes() command to maximize the advantage of using the eBayes approach.
>
> I found in the bioconductor mailing list a somewhat related topic:
>
> https://stat.ethz.ch/pipermail/bioconductor/2007-February/016123.html <https://stat.ethz.ch/pipermail/bioconductor/2007-February/016123.html>
>
> However, my problem is different. Instead of having multiple treatments over the same patients, I have multiple treatments over multiple patients (but still can do a paired t-test because before and after a single treatment is done on the same person).
>
> For simplicity, let's assume that I have 2 diets and 2 patients for each diet. My pData(x.norm) looks like this:
>
> sample replicates sibship
> A96_hA_09_113_1_base.CEL 1 Control_Diet_1 113
> A96_hA_40_113_1_final.CEL 2 Treatment_Diet_1 113
> A96_hA_10_114_1_base.CEL 3 Control_Diet_1 114
> A96_hA_41_114_1_final.CEL 4 Treatment_Diet_1 114
> A96_hA_01_101_2_base.CEL 5 Control_Diet_2 101
> A96_hA_32_101_2_final.CEL 6 Treatment_Diet_2 101
> A96_hA_02_103_2_base.CEL 7 Control_Diet_2 103
> A96_hA_33_103_2_final.CEL 8 Treatment_Diet_2 103
>
> My design matrix (for a paired t-test) is calculated as follows (from the Limma user guide):
> Replicates <- factor(pData(x.norm)$replicates)
> SibShip <- factor(pData(x.norm)$sibship)
> design <- model.matrix(~SibShip+Replicates)
>
> And the design matrix looks like this:
> (Intercept) SibShip103 SibShip113 SibShip114 ReplicatesControl_Diet_2
> 1 1 0 1 0 0
> 2 1 0 1 0 0
> 3 1 0 0 1 0
> 4 1 0 0 1 0
> 5 1 0 0 0 1
> 6 1 0 0 0 0
> 7 1 1 0 0 1
> 8 1 1 0 0 0
> ReplicatesTreatment_Diet_1 ReplicatesTreatment_Diet_2
> 1 0 0
> 2 1 0
> 3 0 0
> 4 1 0
> 5 0 0
> 6 0 1
> 7 0 0
> 8 0 1
> attr(,"assign")
> [1] 0 1 1 1 2 2 2
> attr(,"contrasts")
> attr(,"contrasts")$SibShip
> [1] "contr.treatment"
> attr(,"contrasts")$Replicates
> [1] "contr.treatment"
>
>
> As you can image, the comparisons I am interested in are Control_Diet_1-Treatment_Diet_1 and Control_Diet_2-Treatment_Diet_2. I might also be interested in Control_Diet_1-Control_Diet_2 and Treatment_Diet_1-Treatment_Diet_2, and so forth. My problem is that the current design matrix is rather complicated and that multiple interaction effects are somehow included, i.e. I cannot get individual effects by simply subtracting two factors in the design matrix (as I understand it). My question is: how can I create a contrast matrix that gives me the comparisons I am interested in? I am really looking forward to an answer!
>
> Kind Regards,
>
> Dr. Philip de Groot
> Wageningen University
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623
More information about the Bioconductor
mailing list