[BioC] Limma two group layout; two approaches but different results
Gordon K Smyth
smyth at wehi.EDU.AU
Wed Nov 9 23:15:39 CET 2005
> Date: Wed, 09 Nov 2005 10:53:45 +0100
> From: Bj?rn Usadel <usadel at mpimp-golm.mpg.de>
> Subject: [BioC] Limma two group layout; two approaches but different
> results
> To: bioconductor at stat.math.ethz.ch
>
> Hi all,
>
> Refering to section 8.4 of limmas User guide (2 groups) I found a little
> inconsistency (which might be a feature though),
> when using only 2 versus 2 Affymetrix arrays and the two methods
> proposed in the userguide. (differences versus contrasts)
> Even though, the M values are perfectly the same, there a very
> different p-values and different t-values.
> There are still slight differences for 2 versus 3 arrays, and these
> disappear with more slides. Is this a known behaviour?
> Otherwise, maybe a warning should be issued.
> And yes - taking 2 arrays per condition/genotype is standing on very
> shaky ground, but that is not up for me to decide.
>
>
> Method1
> > toptable(fita,coef="MUvsWT",adjust="BH")
> M t P.Value B
> 4901 3.007937 17.67274 0.04321081 2.022252
> 4634 3.810009 16.17843 0.04321081 1.897954
> 5365 -3.003438 -16.08612 0.04321081 1.889342
> 4282 3.263951 13.71665 0.05394461 1.620254
> 4900 2.820148 13.63034 0.05394461 1.608386
> 6224 2.557232 13.33234 0.05394461 1.566076
> 4609 -2.197100 -12.31159 0.06801382 1.403791
> 1388 -3.369013 -11.80720 0.07283498 1.312306
> 5217 2.703997 11.35791 0.07804811 1.223567
> 4902 2.184269 10.95941 0.08339903 1.138550
>
> Method2 (using contrasts)
> toptable(fit2,adjust="BH")
> M t P.Value B
> 4901 3.0079369 49.70266 0.6066029 -1.427223
> 3141 -0.7710439 -35.26161 0.6066029 -1.448119
> 1359 -0.9824312 -28.13189 0.6066029 -1.471786
> 791 0.4815546 25.80750 0.6066029 -1.483894
> 5365 -3.0034379 -23.94627 0.6066029 -1.496136
> 4609 -2.1971003 -22.47249 0.6066029 -1.507965
> 885 0.3017316 18.37909 0.6066029 -1.556058
> 6224 2.5572321 18.22744 0.6066029 -1.558443
> 4899 0.6554947 17.48778 0.6066029 -1.570918
> 6037 0.6788083 17.38863 0.6066029 -1.572704
>
>
> Here the code
> (As data I used the data from section 11.3 where I simply took the
> first two files for each genotype)
> (http://visitor.ics.uci.edu/genex/cybert/tutorial/index.html)
>
> ###########################################################
>
> library(affy)
> library(limma)
>
> #readin arrays
> fns <- list.celfiles(path="C:/foo/bar/",full.names=TRUE)
> abatch <- ReadAffy(filenames=fns)
> #normalize using rma
> eset<-rma(abatch)
>
> #method1
> design<-cbind(WT=c(1,1,1,1),MUvsWT=c(1,1,0,0))
>
> fita <- lmFit(eset, design)
> fita<-eBayes(fita)
> toptable(fita,coef="MUvsWT",adjust="BH")
>
>
> #method2
> design<-cbind(WT=c(0,0,1,1),MU=c(1,1,0,0))
>
> fit<-lmFit(eset,design)
> cont.matrix <- makeContrasts( MUvsWT=MU-WT,levels=design)
> fit2 <- contrasts.fit(fit, cont.matrix)
> fit2 <- eBayes(fit2)
>
> toptable(fit2,adjust="BH")
> ###########################################################
>
>
> Kind regards,
> and thanks for your help,
>
> Bj?rn Usadel
> MPI of Molecular Plant Physiology
Dear Bjoern,
I haven't seen this phenomenon before, but I can tell you what is causing it. The problem is
caused by a number of probe-sets which have saturated intensities. RMA gives these probe-sets
exactly identical expression values in both replicates. Hence the residual variance for these
probes is exactly zero. Now zero is theoretically impossible for the sample variance, so limma
has to remove these probe-sets when it estimates the empirical Bayes hyperparameters. All OK so
far. But unfortunately the second design matrix gives a slightly different result in R. For the
second design matrix, the smallest sample variance is 1e-15 because of rounding error. Hence no
residual variance is zero, no probe-sets are removed, and limma gets quite different
hyperparameters estimates. You will find that these results are also hardware and operating
system dependent as these will affect rounding error.
In your example, the first results are more reliable. You can make the two agree by removing the
probe-sets with zero variances:
> i <- (fita$sigma==0)
> sum(i)
[1] 40
> fita2 <- eBayes(fita[!i,])
> topTable(fita,coef="MUvsWT")
ID M A t P.Value B
5365 serA_b2913_st -3.07 12.1 -15.8 0.0364 2.76
1388 gltB_b3212_st -3.06 10.4 -14.4 0.0364 2.55
4282 IG_821_1300838_1300922_fwd_st 3.33 12.4 14.0 0.0364 2.48
4900 oppA_b1243_st 3.00 13.2 12.9 0.0364 2.29
5217 rmf_b0953_st 2.81 13.9 12.8 0.0364 2.26
6224 ydgR_b1634_st 2.47 10.3 11.7 0.0364 2.02
1389 gltD_b3213_st -2.96 11.1 -11.7 0.0364 2.01
4902 oppC_b1245_st 2.16 11.0 11.7 0.0364 2.01
4634 lysU_b4129_st 3.40 9.8 11.6 0.0364 2.00
4901 oppB_b1244_st 2.73 10.7 11.6 0.0364 1.98
> fit22 <- eBayes(fit2[!i,])
> topTable(fit22)
ID M A t P.Value B
5335 serA_b2913_st -3.07 12.1 -15.8 0.0362 2.77
1379 gltB_b3212_st -3.06 10.4 -14.4 0.0362 2.56
4264 IG_821_1300838_1300922_fwd_st 3.33 12.4 14.0 0.0362 2.49
4878 oppA_b1243_st 3.00 13.2 12.9 0.0362 2.30
5195 rmf_b0953_st 2.81 13.9 12.8 0.0362 2.27
6184 ydgR_b1634_st 2.47 10.3 11.7 0.0362 2.02
1380 gltD_b3213_st -2.96 11.1 -11.7 0.0362 2.02
4880 oppC_b1245_st 2.16 11.0 11.7 0.0362 2.02
4612 lysU_b4129_st 3.40 9.8 11.6 0.0362 2.01
4879 oppB_b1244_st 2.73 10.7 11.6 0.0362 1.99
Thanks for raising this problem. I will give some thought to modifying the hyperparameter
estimation in limma so that it is no longer sensitive to this issue.
> PS There is also a little typo on page 39 of the User Guide instead of
> design <- cbind(WT=c(1,1,0,0,0,MU=c(0,0,1,1,1))
> it should be
> design <- cbind(WT=c(1,1,0,0,0),MU=c(0,0,1,1,1))
You're right, thanks.
Best wishes
Gordon
More information about the Bioconductor
mailing list