[BioC] qvalues, sam, limma

Wed Jun 9 23:16:00 CEST 2004

I have an Affy experiment with a very high level of differential 
expression.  It is a one-way ANOVA with 6 treatments, 2 replicates per 
treatment.

We ran both SAM (excel version) and limma, and had very good agreement 
between them in terms of ranking the genes by the test statistic.  For any 
set of the top K genes, over 90% of the genes were identified by both 
routines.

SAM automatically produces a q-values and estimates FDR and pi_0 (the 
percentage of non-differentially expressing genes).  I used the 
Bioconductor package "qvalue" to convert the limma p-values to 
q-values.  Both routines are supposed to be based on the same paper.  But 
the SAM q-value for the most highly differentially expressed gene is .0039, 
whereas the q-value from "qvalue" is 3.9e-12.  The SAM q-value for the 
1000th most highly differentially expressed gene is also .0039, but the 
value from "qvalue" is 5.6e-10.

As well, "qvalue" (at FDR=0.01) is returning genes whose p-values are 
pretty big - e.g. p=0.12.  Partly this is because the estimated pi_0 is 
just 7%.  By contrast, SAM estimates pi_0 to be 17% and returns a much 
smaller list of genes at the same FDR.  These genes have unadjusted 
p-values which are quite small.

I guess if I believe SAM, I should be getting about 83% of my genes 
declared statistically significant - which, interestingly enough is about 
what I do get at FDR=.01 from "qvalue".

As always, I welcome the insights of the members of this list.

--Naomi

Naomi S. Altman                                814-865-3791 (voice)
Associate Professor
Bioinformatics Consulting Center
Dept. of Statistics                              814-863-7114 (fax)
Penn State University                         814-865-1348 (Statistics)
University Park, PA 16802-2111