[BioC] Limma two group layout; two approaches but different results

Wed Nov 9 23:15:39 CET 2005

> Date: Wed, 09 Nov 2005 10:53:45 +0100
> From: Bj?rn Usadel <usadel at mpimp-golm.mpg.de>
> Subject: [BioC] Limma two group layout; two approaches but different
> 	results
> To: bioconductor at stat.math.ethz.ch
>
> Hi all,
>
> Refering to section 8.4 of limmas User guide (2 groups) I found a little
> inconsistency (which might be a feature though),
> when using only 2 versus 2 Affymetrix arrays and the two methods
> proposed in the userguide. (differences versus contrasts)
> Even though, the M values are perfectly the same, there a  very
> different p-values and different t-values.
> There are still slight differences for 2 versus 3 arrays, and these
> disappear with more slides. Is this a known behaviour?
> Otherwise, maybe a warning should be issued.
> And yes -  taking 2 arrays per condition/genotype is standing on very
> shaky ground, but that is not up for me to decide.
>
>
> Method1
>  > toptable(fita,coef="MUvsWT",adjust="BH")
>             M         t    P.Value        B
> 4901  3.007937  17.67274 0.04321081 2.022252
> 4634  3.810009  16.17843 0.04321081 1.897954
> 5365 -3.003438 -16.08612 0.04321081 1.889342
> 4282  3.263951  13.71665 0.05394461 1.620254
> 4900  2.820148  13.63034 0.05394461 1.608386
> 6224  2.557232  13.33234 0.05394461 1.566076
> 4609 -2.197100 -12.31159 0.06801382 1.403791
> 1388 -3.369013 -11.80720 0.07283498 1.312306
> 5217  2.703997  11.35791 0.07804811 1.223567
> 4902  2.184269  10.95941 0.08339903 1.138550
>
> Method2 (using contrasts)
> toptable(fit2,adjust="BH")
>              M         t   P.Value         B
> 4901  3.0079369  49.70266 0.6066029 -1.427223
> 3141 -0.7710439 -35.26161 0.6066029 -1.448119
> 1359 -0.9824312 -28.13189 0.6066029 -1.471786
> 791   0.4815546  25.80750 0.6066029 -1.483894
> 5365 -3.0034379 -23.94627 0.6066029 -1.496136
> 4609 -2.1971003 -22.47249 0.6066029 -1.507965
> 885   0.3017316  18.37909 0.6066029 -1.556058
> 6224  2.5572321  18.22744 0.6066029 -1.558443
> 4899  0.6554947  17.48778 0.6066029 -1.570918
> 6037  0.6788083  17.38863 0.6066029 -1.572704
>
>
> Here the code
> (As data I used the data from section 11.3 where I simply  took the
> first two files for each genotype)
> (http://visitor.ics.uci.edu/genex/cybert/tutorial/index.html)
>
> ###########################################################
>
> library(affy)
> library(limma)
>
> #readin arrays
> fns <- list.celfiles(path="C:/foo/bar/",full.names=TRUE)
> abatch <- ReadAffy(filenames=fns)
> #normalize using rma
> eset<-rma(abatch)
>
> #method1
> design<-cbind(WT=c(1,1,1,1),MUvsWT=c(1,1,0,0))
>
> fita <- lmFit(eset, design)
> fita<-eBayes(fita)
> toptable(fita,coef="MUvsWT",adjust="BH")
>
>
> #method2
> design<-cbind(WT=c(0,0,1,1),MU=c(1,1,0,0))
>
> fit<-lmFit(eset,design)
> cont.matrix <- makeContrasts( MUvsWT=MU-WT,levels=design)
> fit2 <- contrasts.fit(fit, cont.matrix)
> fit2 <- eBayes(fit2)
>
> toptable(fit2,adjust="BH")
> ###########################################################
>
>
> Kind regards,
> and thanks for your help,
>
> Bj?rn Usadel
> MPI of Molecular Plant Physiology

Dear Bjoern,

I haven't seen this phenomenon before, but I can tell you what is causing it.  The problem is
caused by a number of probe-sets which have saturated intensities.  RMA gives these probe-sets
exactly identical expression values in both replicates.  Hence the residual variance for these
probes is exactly zero.  Now zero is theoretically impossible for the sample variance, so limma
has to remove these probe-sets when it estimates the empirical Bayes hyperparameters.  All OK so
far.  But unfortunately the second design matrix gives a slightly different result in R.  For the
second design matrix, the smallest sample variance is 1e-15 because of rounding error.  Hence no
residual variance is zero, no probe-sets are removed, and limma gets quite different
hyperparameters estimates.  You will find that these results are also hardware and operating
system dependent as these will affect rounding error.

In your example, the first results are more reliable.  You can make the two agree by removing the
probe-sets with zero variances:

> i <- (fita$sigma==0)
> sum(i)
[1] 40
> fita2 <- eBayes(fita[!i,])
> topTable(fita,coef="MUvsWT")
                                ID     M    A     t P.Value    B
5365                 serA_b2913_st -3.07 12.1 -15.8  0.0364 2.76
1388                 gltB_b3212_st -3.06 10.4 -14.4  0.0364 2.55
4282 IG_821_1300838_1300922_fwd_st  3.33 12.4  14.0  0.0364 2.48
4900                 oppA_b1243_st  3.00 13.2  12.9  0.0364 2.29
5217                  rmf_b0953_st  2.81 13.9  12.8  0.0364 2.26
6224                 ydgR_b1634_st  2.47 10.3  11.7  0.0364 2.02
1389                 gltD_b3213_st -2.96 11.1 -11.7  0.0364 2.01
4902                 oppC_b1245_st  2.16 11.0  11.7  0.0364 2.01
4634                 lysU_b4129_st  3.40  9.8  11.6  0.0364 2.00
4901                 oppB_b1244_st  2.73 10.7  11.6  0.0364 1.98
> fit22 <- eBayes(fit2[!i,])
> topTable(fit22)
                                ID     M    A     t P.Value    B
5335                 serA_b2913_st -3.07 12.1 -15.8  0.0362 2.77
1379                 gltB_b3212_st -3.06 10.4 -14.4  0.0362 2.56
4264 IG_821_1300838_1300922_fwd_st  3.33 12.4  14.0  0.0362 2.49
4878                 oppA_b1243_st  3.00 13.2  12.9  0.0362 2.30
5195                  rmf_b0953_st  2.81 13.9  12.8  0.0362 2.27
6184                 ydgR_b1634_st  2.47 10.3  11.7  0.0362 2.02
1380                 gltD_b3213_st -2.96 11.1 -11.7  0.0362 2.02
4880                 oppC_b1245_st  2.16 11.0  11.7  0.0362 2.02
4612                 lysU_b4129_st  3.40  9.8  11.6  0.0362 2.01
4879                 oppB_b1244_st  2.73 10.7  11.6  0.0362 1.99

Thanks for raising this problem.  I will give some thought to modifying the hyperparameter
estimation in limma so that it is no longer sensitive to this issue.

> PS There is also a little typo on page 39 of the User Guide instead of
> design <- cbind(WT=c(1,1,0,0,0,MU=c(0,0,1,1,1))
> it should be
> design <- cbind(WT=c(1,1,0,0,0),MU=c(0,0,1,1,1))

You're right, thanks.

Best wishes
Gordon