[BioC] qvalues, sam, limma

Thu Jun 10 02:00:17 CEST 2004

At 07:16 AM 10/06/2004, Naomi Altman wrote:
>I have an Affy experiment with a very high level of differential 
>expression.  It is a one-way ANOVA with 6 treatments, 2 replicates per 
>treatment.
>
>We ran both SAM (excel version) and limma, and had very good agreement 
>between them in terms of ranking the genes by the test statistic.  For any 
>set of the top K genes, over 90% of the genes were identified by both routines.

As you've found here, the p-values from limma are intended to rank genes, 
and a general guide of significance, rather than to be believed as absolute 
p-values. (I've taken every opportunity to point this out on this list, in 
talks, and in published papers. See for example Section 7 of 
http://www.statsci.org/smyth/pubs/mareview.pdf.)

Generally speaking, the p-values returned by limma tend to be too small. 
This is because the conjugate normal model being fitted is not exactly 
correct, most notably because the log-intensity measures are almost 
invariably more heavy tailed than normal.

To convert ranked p-values into meaningful measures of false discovery 
rates is far from trivial. SAM uses a re-sampling approach, which of course 
depends on a particular set of assumptions and approximations. The approach 
which I take in my own analyses is to estimate the false discovery rate 
using known-null contrasts and specially designed control spots. This 
approach has not yet been incorporated publicly available software, but I 
hope it will in the future.

I don't fully believe any estimator of false discovery rate that I have yet 
seen, as they don't consistently return believable results on real data sets.

Best
Gordon

>SAM automatically produces a q-values and estimates FDR and pi_0 (the 
>percentage of non-differentially expressing genes).  I used the 
>Bioconductor package "qvalue" to convert the limma p-values to 
>q-values.  Both routines are supposed to be based on the same paper.  But 
>the SAM q-value for the most highly differentially expressed gene is 
>.0039, whereas the q-value from "qvalue" is 3.9e-12.  The SAM q-value for 
>the 1000th most highly differentially expressed gene is also .0039, but 
>the value from "qvalue" is 5.6e-10.
>
>As well, "qvalue" (at FDR=0.01) is returning genes whose p-values are 
>pretty big - e.g. p=0.12.  Partly this is because the estimated pi_0 is 
>just 7%.  By contrast, SAM estimates pi_0 to be 17% and returns a much 
>smaller list of genes at the same FDR.  These genes have unadjusted 
>p-values which are quite small.
>
>I guess if I believe SAM, I should be getting about 83% of my genes 
>declared statistically significant - which, interestingly enough is about 
>what I do get at FDR=.01 from "qvalue".
>
>As always, I welcome the insights of the members of this list.
>
>--Naomi