[BioC] Problem with A mean in Limma

Wed Feb 22 14:45:07 CET 2006

Dear All,

I have posted a mail to the list 2 weeks ago with no answer so far. My first message was probably not very clear so I try again.

I am using Limma 2.4.7 on R 2.2.1 to analyse two-color arrays and I have noticed something strange in the way the mean 'A' value is calculated when one use a contrast matrix.
In my experiment I want to compare the effect of a stimulation on Tcells. This stimulation was applied on Tcells during 6h or during 24h. The 6h stimulated Tcells and 24h stimulated Tcells were compared to the same non stimulated cells (= reference). I have got the following Targets Frame :

FileName Cy3 Cy5
CtrlCy5vsTh2_24hCy3.dav 24h Ref
CtrlCy5vsTh2_6hCy3.dav 6h Ref
Th2_24hCy5vsCtrlCy3.dav Ref 24h
Th2_6hCy5vsCtrlCy3.dav Ref 6h

I would like to know which genes are modulated after 6h of stimulation, which are modulated after 24h and genes modulated between 6h and 24h. After normalisation of the data I use the following script to create a matrix and a contrast matrix:

> design <- cbind("6h"=c(0,-1,0,1),"24h"=c(-1,0,1,0))
> targets <- readTargets()
> rownames(design) <- removeExt(targets$FileName)
> design
                    6h 24h
CtrlCy5vsTh2_24hCy3  0  -1
CtrlCy5vsTh2_6hCy3  -1   0
Th2_24hCy5vsCtrlCy3  0   1
Th2_6hCy5vsCtrlCy3   1   0
> fit <- lmFit(MA, design)
> cont.matrix <- cbind("24h-6h"=c(-1, 1), "6h"=c(1,0), "24h"=c(0,1))
> rownames(cont.matrix) <- cbind("6h","24h")
> cont.matrix
    24h-6h 6h 24h
6h      -1  1   0
24h      1  0   1
> fitcont <- contrasts.fit(fit, cont.matrix)
> fitcontbayes<-eBayes(fitcont)

If one look at the results of the 3 coef of the contrast matrix (: 24h-6h, 6h and 24h) everything seem OK :

> topTable(fitcontbayes, coef=1, n=10, adjust="fdr")
      Block Column Row     ID      Name Status         M        A         t    P.Value        B
21897    39      9   1  46861      PRAC   Gene  2.876222 8.631290  8.433385 0.03855747 3.506137
20473    36      1  14  28828     PAPD1   Gene -2.383738 6.927029 -7.473090 0.03855747 2.783536
19040    34      8   2  78497    CLDN23   Gene -3.139618 5.859508 -8.299092 0.03855747 2.386795
6221     11      5  20  84193 LOC115749   Gene  3.010418 5.792223  7.957571 0.03855747 2.193344
19497    34      9  21 125848    CRISP3   Gene -2.376080 6.925017 -6.742931 0.07172051 2.143568
21099    37      3  16 147050    HCG2P7   Gene -2.724596 6.706554 -7.202047 0.07172051 1.713956
6196     11      4  19  64892    ETR101   Gene  1.954074 8.169356  6.048570 0.13538457 1.453035
6199     11      7  19 149357       #NA   Gene  1.898843 8.060294  5.954056 0.14110973 1.352464
21132    37     12  17 155265 LOC440590   Gene -2.527218 6.638719 -6.680307 0.10272218 1.336678
20556    36     12  17 160410       #NA   Gene  2.082918 7.656239  5.881841 0.14362662 1.274528
> topTable(fitcontbayes, coef=2, n=10, adjust="fdr")
      Block Column Row     ID         Name Status         M         A         t      P.Value        B
2603      5     11  13 128320          CA2   Gene  2.312298  8.323775 12.690864 0.0004900596 7.446470
23250    41     18   9  83486       CYP1B1   Gene  1.365898  9.351358  7.529896 0.0168728265 4.118037
14667    26      3  12  89981          TPO   Gene -1.370724  9.825983 -7.441705 0.0168728265 4.018976
17543    31     23  11  95124          TNC   Gene  1.362982 11.478777  7.450669 0.0176675806 3.633118
27313    48      1  11  59209         CTSC   Gene  1.337229 10.259389  6.727203 0.0276792498 3.169911
2542      5     22  10  91105        FABP5   Gene  1.180159 10.683417  6.708199 0.0276792498 3.146168
19497    34      9  21 125848       CRISP3   Gene  2.254023  6.925017  7.834144 0.0168728265 3.042473
4862      9     14  11 158129       CAPN14   Gene  2.252979  5.170231  8.422209 0.0168728265 3.002382
18922    33     10  21  49700 DKFZp547F072   Gene  2.241397  6.368605  8.378910 0.0168728265 2.975528
20210    36      2   3 129217     MGC10854   Gene  2.155348  6.019866  8.057238 0.0176289711 2.768940
> topTable(fitcontbayes, coef=3, n=10, adjust="fdr")
      Block Column Row     ID        Name Status         M         A          t      P.Value        B
23257    41      1  10 119072       CDH26   Gene  2.045561  8.618405  10.977306 0.0007520237 8.012605
2550      5      6  11 112799    SERPINB3   Gene  1.905378  9.951108  10.146438 0.0007520237 7.297240
21897    39      9   1  46861        PRAC   Gene  3.036560  8.631290  10.904529 0.0007520237 6.084763
6221     11      5  20  84193   LOC115749   Gene  3.157852  5.792223  11.804853 0.0007520237 5.857994
20473    36      1  14  28828       PAPD1   Gene -2.693871  6.927029 -10.343418 0.0009737256 5.746556
27313    48      1  11  59209        CTSC   Gene  1.677554 10.259389   8.439275 0.0036026318 5.575635
17543    31     23  11  95124         TNC   Gene  2.432964 11.478777   9.404284 0.0022965417 5.110162
18097    32      1  11 104580        CTSC   Gene  1.414608  9.663292   8.014200 0.0056176380 5.087315
22151    39     23  11 104788      COL6A3   Gene  1.507607  9.470752   7.775592 0.0069709347 4.801786
8275     15     19   9  53347 TXN delta 3   Gene  1.340699 11.341639   7.522172 0.0090096504 4.489250

But when I sort the table according to the 'A' value one can see that the genes have exactly the same 'A' values for the 1st the 2d and the 3d coef :

> topTable(fitcontbayes, coef=1, n=10, adjust="fdr", sort.by="A")
      Block Column Row     ID      Name Status  M        A  t P.Value  B
4716      9     12   5  17228     CPSF4   Gene NA 15.16203 NA      NA NA
2944      6     16   3  96063      RBM4   Gene NA 15.12503 NA      NA NA
772       2      4   9   5198    KCNAB3   Gene NA 15.06048 NA      NA NA
18041    32     17   8 101204    GABBR1   Gene NA 15.03191 NA      NA NA
16817    30     17   5  15058    PRKAA1   Gene NA 14.98546 NA      NA NA
7642     14     10   7  90324     GSTA3   Gene NA 14.85254 NA      NA NA
5105      9     17  21 149752 LOC199800   Gene NA 14.83062 NA      NA NA
20702    36     14  23 104223       #NA   Gene NA 14.81930 NA      NA NA
4270      8     22  10  94545  SERPINA6   Gene NA 14.79895 NA      NA NA
6429     12     21   4  96944    MAD1L1   Gene NA 14.79082 NA      NA NA
> topTable(fitcontbayes, coef=2, n=10, adjust="fdr", sort.by="A")
      Block Column Row     ID      Name Status  M        A  t P.Value  B
4716      9     12   5  17228     CPSF4   Gene NA 15.16203 NA      NA NA
2944      6     16   3  96063      RBM4   Gene NA 15.12503 NA      NA NA
772       2      4   9   5198    KCNAB3   Gene NA 15.06048 NA      NA NA
18041    32     17   8 101204    GABBR1   Gene NA 15.03191 NA      NA NA
16817    30     17   5  15058    PRKAA1   Gene NA 14.98546 NA      NA NA
7642     14     10   7  90324     GSTA3   Gene NA 14.85254 NA      NA NA
5105      9     17  21 149752 LOC199800   Gene NA 14.83062 NA      NA NA
20702    36     14  23 104223       #NA   Gene NA 14.81930 NA      NA NA
4270      8     22  10  94545  SERPINA6   Gene NA 14.79895 NA      NA NA
6429     12     21   4  96944    MAD1L1   Gene NA 14.79082 NA      NA NA
> topTable(fitcontbayes, coef=3, n=10, adjust="fdr", sort.by="A")
      Block Column Row     ID      Name Status  M        A  t P.Value  B
4716      9     12   5  17228     CPSF4   Gene NA 15.16203 NA      NA NA
2944      6     16   3  96063      RBM4   Gene NA 15.12503 NA      NA NA
772       2      4   9   5198    KCNAB3   Gene NA 15.06048 NA      NA NA
18041    32     17   8 101204    GABBR1   Gene NA 15.03191 NA      NA NA
16817    30     17   5  15058    PRKAA1   Gene NA 14.98546 NA      NA NA
7642     14     10   7  90324     GSTA3   Gene NA 14.85254 NA      NA NA
5105      9     17  21 149752 LOC199800   Gene NA 14.83062 NA      NA NA
20702    36     14  23 104223       #NA   Gene NA 14.81930 NA      NA NA
4270      8     22  10  94545  SERPINA6   Gene NA 14.79895 NA      NA NA
6429     12     21   4  96944    MAD1L1   Gene NA 14.79082 NA      NA NA

In all the 3 tables the mean 'A' value is the mean of the 'A' values from the 4 slides where I expected for the second and third table (coef=2 and coef=3) a mean between only 2 slides (slides 2 and 4 for 6h (coef=2) and slides 1 and 3 for 24h (coef=3)). I first though it was just a problem in the report of 'A' values in topTable but it seems to have an effect also on the P.value and B value.

Is that a mistake in 'lmFit' or 'contrasts.fit' functions or am I doing something wrong in the design of my matrix and contrast matrix?

Many thanks for any help or comments.

David

######################################
David HOT, PhD.
Laboratoire d'Etudes Transcriptomiques et Génomiques Appliquées - Plate-forme Biopuces
UMR8161 - IFR142

Institut Pasteur de Lille
1, rue du professeur Calmette
59 019 LILLE, Cedex.
FRANCE
Tél : +33 (0)3 20 87 72 09
Fax : +33 (0)3 20 87 73 11