[BioC] LIMMA

Gordon Smyth smyth at wehi.EDU.AU
Sat Jul 14 04:44:17 CEST 2007


Dear Lev,

One of the purposes of a mailing list is that many people may reply, 
so please do not address your question specifically to me.

As explained in Section 10.1 of the limma User's Guide, the 
B-statistic requires a prior estimate of the proportion of DE genes. 
By default this is set to 1%.

Therefore, the B-statistic will tend to underestimate significance if 
the true proportion of DE genes is actually more than 1% and 
overestimate if the true proportion is less than 1%. In your case, 
the proportion of DE genes appears to be massively more than 1%, 
hence you'd expect the B-statistic to underestimate significance. 
That is, you'd expect the B-statistics to be too small.

Since the moderated t does not require a prior estimate, you'd expect 
the p-values to suggest more DE than the B-statistics whenever the 
proportion of DE genes in your data is large.

Best wishes
Gordon

At 12:27 AM 14/07/2007, Lev Soinov wrote:
>Dear Gordon,
>
>We are analysing a dataset of 14920 genes obtained with the AB1700 platform.
>It has three treatments L1, L2, L1+L2 and control. The data is in 
>the form of expression data matrix with the first column as pobe ID 
>and 14 other columns correspond to 4 above conditions. Using the 
>code below, we obtain a huge number of genes with adjusted p values 
><0.05, about 5000 for the comparison between L1 and control for 
>example. At the same time B values corresponding to these probes are 
>very small, i.e. we are getting B<-4 in the bottom of the list of 
>probes with adj.p<0.05.
>Could you please comment on possible causes for this? Is it normal?
>With kind regards,
>Lev.
>
> >s<-scan('Data.txt',what='character')
>Read 223800 items
> > sm<-matrix(s,byrow=TRUE,ncol=15)
> > dim(sm)
>[1] 14920    15
> > rownames(sm)<-sm[,1]
> > sm<-sm[,2:ncol(sm)]
> > snn<-apply(sm,2,as.numeric)
> > rownames(snn)<-rownames(sm)
> > signals<-snn
> > dim(signals)
>[1] 14920    14
> > temp<-normalizeBetweenArrays(log2(signals), method='quantile')
> > design <- model.matrix(~0 +factor(c(1,1,1,1,2,2,2,3,3,3,3,4,4,4)))
> > colnames(design) <- c("Control","L1","L2","L1L2")
> > contrast.matrix <- 
> makeContrasts(L1-Control,L2-Control,L1L2-Control,levels=design)
> > fit <- lmFit(temp, design)
> > fit2 <- contrasts.fit(fit, contrast.matrix)
> > fit2 <- eBayes(fit2)
> > topTable(fit2, coef=1, adjust='BH')
>           ID    logFC        t      P.Value    adj.P.Val        B
>8790  182417 5.813459 38.16912 1.072876e-15 1.479057e-11 25.47446
>6945  165482 8.573261 35.59768 2.856130e-15 1.479057e-11 24.69238
>7132  167208 6.247484 35.49523 2.973975e-15 1.479057e-11 24.65950
>10941 202780 4.881978 33.98673 5.467499e-15 2.039377e-11 24.15906
>1102  109858 5.076380 33.01348 8.214210e-15 2.451120e-11 23.81910
>3785  135458 3.686867 32.09869 1.217373e-14 3.027202e-11 23.48654
>12355 215284 5.035617 30.68240 2.288454e-14 4.877676e-11 22.94512
>6515  161539 5.885744 29.64292 3.704033e-14 6.908021e-11 22.52582
>9789  191745 8.189347 28.65188 5.953817e-14 9.870106e-11 22.10752
>8568  180293 4.749725 27.88955 8.671664e-14 1.293812e-10 21.77270
>
>
>The bottom of the adj.p<0.05 list:
>         ID      logFC               t                P.Value 
>   adj.P.Val        B
>207673  -0.293302      -2.70223       0.01703      0.042251  -4.2848
>213498  -0.675519      -2.70219      0.017033     0.042251  -4.2849
>186148  -0.419934      -2.70201      0.017039     0.042258  -4.2853
>233859  -0.422533      -2.70185      0.017044     0.042263  -4.2856
>188263  -0.330067      -2.70179      0.017046     0.042263  -4.2857



More information about the Bioconductor mailing list