[BioC] Limma: doing multiple paired t-tests in one go...

Thu Nov 29 16:42:41 CET 2007

Hello Jim,

Thank you for your answer! Unfortunately, it does NOT solve my problem. As a matter of fact: I already tried this myself... Let me show the problem as follows. First, I reproduce what you emailed me:

> sib
SibShips
     113      113      114      114      101      101      103      103
Levels: 101 103 113 114
> rep
[1] "Control_Diet_1"   "Treatment_Diet_1" "Control_Diet_1"   "Treatment_Diet_1"
[5] "Control_Diet_2"   "Treatment_Diet_2" "Control_Diet_2"   "Treatment_Diet_2"
> design <- model.matrix(~0+rep+sib)
Warning message:
In model.matrix.default(~0 + rep + sib) :
  variable 'rep' converted to a factor
> design
  repControl_Diet_1 repControl_Diet_2 repTreatment_Diet_1 repTreatment_Diet_2
1                 1                 0                   0                   0
2                 0                 0                   1                   0
3                 1                 0                   0                   0
4                 0                 0                   1                   0
5                 0                 1                   0                   0
6                 0                 0                   0                   1
7                 0                 1                   0                   0
8                 0                 0                   0                   1
  sib103 sib113 sib114
1      0      1      0
2      0      1      0
3      0      0      1
4      0      0      1
5      0      0      0
6      0      0      0
7      1      0      0
8      1      0      0
attr(,"assign")
[1] 1 1 1 1 2 2 2
attr(,"contrasts")
attr(,"contrasts")$rep
[1] "contr.treatment"
attr(,"contrasts")$sib
[1] "contr.treatment"

Secondly, I continue with the required Limma calculations and show you the result for a single probe:

>     fit <- lmFit(x.norm, design)
Coefficients not estimable: sib114

(Note the error message. I suppose that it can be skipped because I am not interested in this factor anyway).

>     contrast.matrix <- makeContrasts(contrasts=c("repControl_Diet_1-repTreatment_Diet_1", "repControl_Diet_2-repTreatment_Diet_2"), levels=design)
> contrast.matrix
                     Contrasts
Levels                repControl_Diet_1-repTreatment_Diet_1
  repControl_Diet_1                                       1
  repControl_Diet_2                                       0
  repTreatment_Diet_1                                    -1
  repTreatment_Diet_2                                     0
  sib103                                                  0
  sib113                                                  0
  sib114                                                  0
                     Contrasts
Levels                repControl_Diet_2-repTreatment_Diet_2
  repControl_Diet_1                                       0
  repControl_Diet_2                                       1
  repTreatment_Diet_1                                     0
  repTreatment_Diet_2                                    -1
  sib103                                                  0
  sib113                                                  0
  sib114                                                  0
>
(makes sense if you ask me)

> fit <- contrasts.fit(fit, contrast.matrix)
> tstat.ord <- fit$coef / fit$stdev.unscaled / fit$sigma
> pvalue.ord <- 2 * pt( abs(tstat.ord), df=fit$df.residual, lower.tail=FALSE)
> tstat.ord[1,]
repControl_Diet_1-repTreatment_Diet_1 repControl_Diet_2-repTreatment_Diet_2
                            -0.570853                             -1.724101
> pvalue.ord[1,]
repControl_Diet_1-repTreatment_Diet_1 repControl_Diet_2-repTreatment_Diet_2
                            0.6256902                             0.2268312

Now let's verify this with a reference method!:
> y_after <- exprs(x.norm)[1,c(2,4)]
> y_after
A96_hA_40_113_1_final.CEL A96_hA_41_114_1_final.CEL
                 7.182141                  7.480890
> y_before <- exprs(x.norm)[1,c(1,3)]
> y_before
A96_hA_09_113_1_base.CEL A96_hA_10_114_1_base.CEL
                7.230783                 7.363846
> t.verify <- t.test(y_before, y_after, paired=T)
> print(t.verify)
        Paired t-test
data:  y_before and y_after
t = -0.4128, df = 1, p-value = 0.7507
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -1.086823  1.018420
sample estimates:
mean of the differences
            -0.03420119

This does not agree! However, when I exclude Diet 2 in the ExpressionSet and redo the Limma calculation, it DOES match!!:
> sib <- pData(x.norm)$sibship[c(1:4)]
> sib
SibShips
     113      113      114      114
Levels: 101 103 113 114
> rep <- pData(x.norm)$replicates[c(1:4)]
> rep
[1] "Control_Diet_1"   "Treatment_Diet_1" "Control_Diet_1"   "Treatment_Diet_1"
> design <- model.matrix(~0+rep+sib)
Warning message:
In model.matrix.default(~0 + rep + sib) :
  variable 'rep' converted to a factor
> design
  repControl_Diet_1 repTreatment_Diet_1 sib103 sib113 sib114
1                 1                   0      0      1      0
2                 0                   1      0      1      0
3                 1                   0      0      0      1
4                 0                   1      0      0      1
attr(,"assign")
[1] 1 1 2 2 2
attr(,"contrasts")
attr(,"contrasts")$rep
[1] "contr.treatment"
attr(,"contrasts")$sib
[1] "contr.treatment"
>     fit <- lmFit(x.norm[,c(1:4)], design)
Coefficients not estimable: sib103 sib114
>     contrast.matrix <- makeContrasts(contrasts=c("repControl_Diet_1-repTreatment_Diet_1"), levels=design)
> contrast.matrix
                     Contrasts
Levels                repControl_Diet_1-repTreatment_Diet_1
  repControl_Diet_1                                       1
  repTreatment_Diet_1                                    -1
  sib103                                                  0
  sib113                                                  0
  sib114                                                  0
>     fit <- contrasts.fit(fit, contrast.matrix)
>  tstat.ord <- fit$coef / fit$stdev.unscaled / fit$sigma
>  pvalue.ord <- 2 * pt( abs(tstat.ord), df=fit$df.residual, lower.tail=FALSE)
> tstat.ord[1,]
[1] -0.412843
> pvalue.ord[1,]
[1] 0.7507451

Which is in agreement with the t.test-calculation I demonstrated previously! And this is exactly my problem: As soon as more diets are included, I cannot (correctly) fit a model anymore that gives me t-values which are in agreement with my reference method. Why? And more importantly: how to create a linear model (and/or contrasts matrix) that fixes this problem? Any help is highly appreciated!

Kind regards,

Philip

________________________________

From: James W. MacDonald [mailto:jmacdon at med.umich.edu]
Sent: Thu 29-11-2007 15:07
To: Groot, Philip de
Cc: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] Limma: doing multiple paired t-tests in one go...

Hi Philip,

Does this help?

 > sib
[1] 113 113 114 114 101 101 103 103
Levels: 101 103 113 114
 > rep
[1] Control_Diet_1   Treatment_Diet_1
[3] Control_Diet_1   Treatment_Diet_1
[5] Control_Diet_2   Treatment_Diet_2
[7] Control_Diet_2   Treatment_Diet_2
4 Levels: Control_Diet_1 ... Treatment_Diet_2
 > design <- model.matrix(~0+rep+sib)
 > design
   repControl_Diet_1 repControl_Diet_2
1                 1                 0
2                 0                 0
3                 1                 0
4                 0                 0
5                 0                 1
6                 0                 0
7                 0                 1
8                 0                 0
   repTreatment_Diet_1 repTreatment_Diet_2 sib103
1                   0                   0      0
2                   1                   0      0
3                   0                   0      0
4                   1                   0      0
5                   0                   0      0
6                   0                   1      0
7                   0                   0      1
8                   0                   1      1
   sib113 sib114
1      1      0
2      1      0
3      0      1
4      0      1
5      0      0
6      0      0
7      0      0
8      0      0
attr(,"assign")
[1] 1 1 1 1 2 2 2
attr(,"contrasts")
attr(,"contrasts")$rep
[1] "contr.treatment"

attr(,"contrasts")$sib
[1] "contr.treatment"

Best,

Jim

Groot, Philip de wrote:
> Hello All,
> 
> I encountered a problem that I cannot easily solve, most probably because my knowledge of linear models is too restricted. The problem is that I want to do a paired t-test using limma, but that I want to fit multiple comparisons (using different patients!) simultanuously. The reason for this is that all .CEL-files in my experiment are fitted and this fit is used for the eBayes() command to maximize the advantage of using the eBayes approach.
> 
> I found in the bioconductor mailing list a somewhat related topic:
> 
> https://stat.ethz.ch/pipermail/bioconductor/2007-February/016123.html <https://stat.ethz.ch/pipermail/bioconductor/2007-February/016123.html>
> 
> However, my problem is different. Instead of having multiple treatments over the same patients, I have multiple treatments over multiple patients (but still can do a paired t-test because before and after a single treatment is done on the same person).
> 
> For simplicity, let's assume that I have 2 diets and 2 patients for each diet. My pData(x.norm) looks like this:
> 
>                           sample       replicates sibship
> A96_hA_09_113_1_base.CEL       1   Control_Diet_1     113
> A96_hA_40_113_1_final.CEL      2 Treatment_Diet_1     113
> A96_hA_10_114_1_base.CEL       3   Control_Diet_1     114
> A96_hA_41_114_1_final.CEL      4 Treatment_Diet_1     114
> A96_hA_01_101_2_base.CEL       5   Control_Diet_2     101
> A96_hA_32_101_2_final.CEL      6 Treatment_Diet_2     101
> A96_hA_02_103_2_base.CEL       7   Control_Diet_2     103
> A96_hA_33_103_2_final.CEL      8 Treatment_Diet_2     103
>
>  My design matrix (for a paired t-test) is calculated as follows (from the Limma user guide):
>     Replicates <- factor(pData(x.norm)$replicates)
>     SibShip <- factor(pData(x.norm)$sibship)
>     design <- model.matrix(~SibShip+Replicates)
>
> And the design matrix looks like this:
>   (Intercept) SibShip103 SibShip113 SibShip114 ReplicatesControl_Diet_2
> 1           1          0          1          0                        0
> 2           1          0          1          0                        0
> 3           1          0          0          1                        0
> 4           1          0          0          1                        0
> 5           1          0          0          0                        1
> 6           1          0          0          0                        0
> 7           1          1          0          0                        1
> 8           1          1          0          0                        0
>   ReplicatesTreatment_Diet_1 ReplicatesTreatment_Diet_2
> 1                          0                          0
> 2                          1                          0
> 3                          0                          0
> 4                          1                          0
> 5                          0                          0
> 6                          0                          1
> 7                          0                          0
> 8                          0                          1
> attr(,"assign")
> [1] 0 1 1 1 2 2 2
> attr(,"contrasts")
> attr(,"contrasts")$SibShip
> [1] "contr.treatment"
> attr(,"contrasts")$Replicates
> [1] "contr.treatment"
> 
> 
> As you can image, the comparisons I am interested in are Control_Diet_1-Treatment_Diet_1 and Control_Diet_2-Treatment_Diet_2. I might also be interested in Control_Diet_1-Control_Diet_2 and Treatment_Diet_1-Treatment_Diet_2, and so forth. My problem is that the current design matrix is rather complicated and that multiple interaction effects are somehow included, i.e. I cannot get individual effects by simply subtracting two factors in the design matrix (as I understand it). My question is: how can I create a contrast matrix that gives me the comparisons I am interested in? I am really looking forward to an answer!
> 
> Kind Regards,
> 
> Dr. Philip de Groot
> Wageningen University
> 
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

--
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623