[BioC] Unexpectedly high FDR using contrasts in Limma?

Thu Apr 5 20:45:01 CEST 2007

Hi everyone,

I'm analysing (limma 2.9.10) a set of twelve 2-colour cDNA arrays (4  
experiments of 3 slides each) using a common reference design. I find  
that I get very high adjusted P values (BH) using contrasts to compare  
some of these experiments.

No adjusted P value is lower than 0.9, which would indicate there's  
not enough evidence that any gene behaves different in either  
experiment, and I find that surprising. Yes, all four experiments are  
quite similar, but there are some genes that do stand out enough (I'd  
think!) and I have confirmation by RT that this is the case.

So I am wondering whether I don't fully understand where these  
adjusted p values come from, or whether I'm not asking the question I  
think I am asking, or maybe an error on my code?

Here's a summary of what I did:

- 12 files read with read.maimages

> targets                    #H226 is the common reference
    SlideNumber     FileName   Cy3   Cy5
1           25 140025-3.gpr  H226  MBD1
2           26 140026-3.gpr  MBD1  H226
3           23 140023-3.gpr  H226  MBD1
4           19 140019-3.gpr  MBD2  H226
5           17 140017-3.gpr  H226  MBD2
6           15 140015-3.gpr  MBD2  H226
7           20 140020-3.gpr  H226 MeCP2
8           18 140018-3.gpr MeCP2  H226
9           16 140016-3.gpr MeCP2  H226
10          22 140022-3.gpr  H226   DBL
11          24 140024-3.gpr   DBL  H226
12          14 140014-3.gpr   DBL  H226

- background corrected using a custom method: RGList object 'RG.s'

> design                # H226 as common reference
    DBL MBD1 MBD2 MeCP2
1    0    1    0     0
2    0   -1    0     0
3    0    1    0     0
4    0    0   -1     0
5    0    0    1     0
6    0    0   -1     0
7    0    0    0     1
8    0    0    0    -1
9    0    0    0    -1
10   1    0    0     0
11  -1    0    0     0
12  -1    0    0     0

> MA.sw<-normalizeWithinArrays(RG.s, layout, bc.method="none",  
> method="printtiploess")

> fit.sw<-lmFit(MA.sw, design, method="ls")
> eb.sw<-eBayes(fit.sw, proportion=0.01)

- I get the adjusted P values from 'topTable':

> tt.sw.MBD1<-topTable(eb.sw, coef=2, n=number, genelist=gal,  
> adjust.method="BH", sort.by="P")
> tt.sw.MBD2<-topTable(eb.sw, coef=3, n=number, genelist=gal,  
> adjust.method="BH", sort.by="P")
> tt.sw.MeCP2<-topTable(eb.sw, coef=4, n=number, genelist=gal,  
> adjust.method="BH", sort.by="P")
> tt.sw.DBL<-topTable(eb.sw, coef=1, n=number, genelist=gal,  
> adjust.method="BH", sort.by="P")

The values I get here look alright and make sense.

 From the 4 experiments, the 4th one (DBL) is a control experiment. I  
would like to compare the other three to it, to see what genes are  
differentially expressed between each of the first three experiments  
and the control. I wanted to use contrasts for this. This is how I did  
it:

> levels<-c("DBL", "MBD1", "MBD2", "MeCP2")
> contrasts<-makeContrasts(DBL,MBD1,MBD2,MeCP2,DBL-MBD1,DBL-MBD2,DBL-MeCP2,levels=levels)

> contrasts
        Contrasts
Levels  DBL MBD1 MBD2 MeCP2 DBL - MBD1 DBL - MBD2 DBL - MeCP2
   DBL     1    0    0     0          1          1           1
   MBD1    0    1    0     0         -1          0           0
   MBD2    0    0    1     0          0         -1           0
   MeCP2   0    0    0     1          0          0          -1

> fitc<-contrasts.fit(fit.sw,contrasts)
> ebfitc<-eBayes(fitc)

then I use 'topTable' again to get the adjusted p values etc:

> tt.cMBD1<-topTable(ebfitc, coef=5, n=number, genelist=gal,  
> adjust.method="BH", sort.by="P")
> tt.cMBD2<-topTable(ebfitc, coef=6, n=number, genelist=gal,  
> adjust.method="BH", sort.by="P")
> tt.cMeCP2<-topTable(ebfitc, coef=7, n=number, genelist=gal,  
> adjust.method="BH", sort.by="P")

the top of the list makes sense, I picked up the genes I was  
expecting, however the adjusted p values are terrible, so I wonder if  
this is real or I am making a mistake somewhere.

As an example, here's a gene I know to be differentially expressed: CDKN1C
This gene has negligible expression in the common reference, it has  
negligible expression after the control experiment (DBL) is performed,  
but it is clearly expressed after the MBD1 experiment is performed  
(verified by RT). This is what I get:

A.mean = 10.48 (it goes up to 11.6 in the individual slides of the  
MBD1 experiment, it stays low in all others)

M (MBD1) = 2.98
P (MBD1) = 6.07e-05
adj p val (MBD1) = 0.00044

M (DBL) = -0.44
P (DBL) = 0.38
adj p val (DBL) = 0.56

M (DBL-MBD1) = -3.43
P (DBL-MBD1) = 0.00036
adj p val (DBL-MBD1) = 0.95

I am surprised this gene (and several others like this one) give me  
such poor adjusted p values...

why?

Jose

-- 
Dr. Jose I. de las Heras                      Email: J.delasHeras at ed.ac.uk
The Wellcome Trust Centre for Cell Biology    Phone: +44 (0)131 6513374
Institute for Cell & Molecular Biology        Fax:   +44 (0)131 6507360
Swann Building, Mayfield Road
University of Edinburgh
Edinburgh EH9 3JR
UK