[BioC] qvalues, sam, limma
Naomi Altman
naomi at stat.psu.edu
Wed Jun 9 23:16:00 CEST 2004
I have an Affy experiment with a very high level of differential
expression. It is a one-way ANOVA with 6 treatments, 2 replicates per
treatment.
We ran both SAM (excel version) and limma, and had very good agreement
between them in terms of ranking the genes by the test statistic. For any
set of the top K genes, over 90% of the genes were identified by both
routines.
SAM automatically produces a q-values and estimates FDR and pi_0 (the
percentage of non-differentially expressing genes). I used the
Bioconductor package "qvalue" to convert the limma p-values to
q-values. Both routines are supposed to be based on the same paper. But
the SAM q-value for the most highly differentially expressed gene is .0039,
whereas the q-value from "qvalue" is 3.9e-12. The SAM q-value for the
1000th most highly differentially expressed gene is also .0039, but the
value from "qvalue" is 5.6e-10.
As well, "qvalue" (at FDR=0.01) is returning genes whose p-values are
pretty big - e.g. p=0.12. Partly this is because the estimated pi_0 is
just 7%. By contrast, SAM estimates pi_0 to be 17% and returns a much
smaller list of genes at the same FDR. These genes have unadjusted
p-values which are quite small.
I guess if I believe SAM, I should be getting about 83% of my genes
declared statistically significant - which, interestingly enough is about
what I do get at FDR=.01 from "qvalue".
As always, I welcome the insights of the members of this list.
--Naomi
Naomi S. Altman 814-865-3791 (voice)
Associate Professor
Bioinformatics Consulting Center
Dept. of Statistics 814-863-7114 (fax)
Penn State University 814-865-1348 (Statistics)
University Park, PA 16802-2111
More information about the Bioconductor
mailing list