[BioC] qvalues, sam, limma
Gordon Smyth
smyth at wehi.edu.au
Thu Jun 10 02:00:17 CEST 2004
At 07:16 AM 10/06/2004, Naomi Altman wrote:
>I have an Affy experiment with a very high level of differential
>expression. It is a one-way ANOVA with 6 treatments, 2 replicates per
>treatment.
>
>We ran both SAM (excel version) and limma, and had very good agreement
>between them in terms of ranking the genes by the test statistic. For any
>set of the top K genes, over 90% of the genes were identified by both routines.
As you've found here, the p-values from limma are intended to rank genes,
and a general guide of significance, rather than to be believed as absolute
p-values. (I've taken every opportunity to point this out on this list, in
talks, and in published papers. See for example Section 7 of
http://www.statsci.org/smyth/pubs/mareview.pdf.)
Generally speaking, the p-values returned by limma tend to be too small.
This is because the conjugate normal model being fitted is not exactly
correct, most notably because the log-intensity measures are almost
invariably more heavy tailed than normal.
To convert ranked p-values into meaningful measures of false discovery
rates is far from trivial. SAM uses a re-sampling approach, which of course
depends on a particular set of assumptions and approximations. The approach
which I take in my own analyses is to estimate the false discovery rate
using known-null contrasts and specially designed control spots. This
approach has not yet been incorporated publicly available software, but I
hope it will in the future.
I don't fully believe any estimator of false discovery rate that I have yet
seen, as they don't consistently return believable results on real data sets.
Best
Gordon
>SAM automatically produces a q-values and estimates FDR and pi_0 (the
>percentage of non-differentially expressing genes). I used the
>Bioconductor package "qvalue" to convert the limma p-values to
>q-values. Both routines are supposed to be based on the same paper. But
>the SAM q-value for the most highly differentially expressed gene is
>.0039, whereas the q-value from "qvalue" is 3.9e-12. The SAM q-value for
>the 1000th most highly differentially expressed gene is also .0039, but
>the value from "qvalue" is 5.6e-10.
>
>As well, "qvalue" (at FDR=0.01) is returning genes whose p-values are
>pretty big - e.g. p=0.12. Partly this is because the estimated pi_0 is
>just 7%. By contrast, SAM estimates pi_0 to be 17% and returns a much
>smaller list of genes at the same FDR. These genes have unadjusted
>p-values which are quite small.
>
>I guess if I believe SAM, I should be getting about 83% of my genes
>declared statistically significant - which, interestingly enough is about
>what I do get at FDR=.01 from "qvalue".
>
>As always, I welcome the insights of the members of this list.
>
>--Naomi
More information about the Bioconductor
mailing list