[BioC] qvalues, sam, limma

Thu Jun 10 02:01:07 CEST 2004

Hi,
I can make an observation about the differences in pi_0 estimation
between 
the packages siggenes (SAM analysis) and qvalue.

> packageDescription("qvalue", field="Version")
[1] "1.0.3"
> packageDescription("siggenes", field="Version")
[1] "1.0.6"
> 

The package siggenes uses the function "p0.est" which has the following

cubic spline code:
...
    spline.out <- smooth.spline(vec.lambda, vec.p0, w = 1 - vec.lambda,

        df = 3)
    p0 <- min(predict(spline.out, 1)$y, 1).
...
The package qvalue uses the function "qvalue" which has the following 
cubic spline code:
...
 spi0 <- smooth.spline(lambda, pi0, df = 3)
 pi0 <- predict(spi0, x = max(lambda))$y
...

The estimate pi_0(lambda) over a range of lambda observations becomes 
unbiased as lambda -> 1, with a bias variance tradeoff.

The real difference is that observations are not weighted by 1-lambda 
in the package qvalue, and the estimate of pi_0 is at pi_0(lambda=0.95)

by default in the arguements (lambda = seq(0, 0.95, 0.05)). 
In the package siggenes, pi_0 is estimated at pi_0(lambda=1). 

By weighting the observations with 1-lambda assumes values with small 
lambda are trusted to be more accurate.

Details of the weighting of 1-lambda is in the preprint on page 13 of:

Storey J. D. and Tibshirani R. J. Statistical significance for
genome-wide 
experiments. Preprint, January 2003.

I have run simulations that suggest that including the 1-lambda weights

can improve the variance in the estimation of pi_0(lambda) over the
entire 
range of pi_0.

marcus

>>> Naomi Altman <naomi at stat.psu.edu> 10/06/2004 9:16:00 AM >>>
I have an Affy experiment with a very high level of differential 
expression.  It is a one-way ANOVA with 6 treatments, 2 replicates per

treatment.

We ran both SAM (excel version) and limma, and had very good agreement

between them in terms of ranking the genes by the test statistic.  For
any 
set of the top K genes, over 90% of the genes were identified by both 
routines.

SAM automatically produces a q-values and estimates FDR and pi_0 (the 
percentage of non-differentially expressing genes).  I used the 
Bioconductor package "qvalue" to convert the limma p-values to 
q-values.  Both routines are supposed to be based on the same paper. 
But 
the SAM q-value for the most highly differentially expressed gene is
.0039, 
whereas the q-value from "qvalue" is 3.9e-12.  The SAM q-value for the

1000th most highly differentially expressed gene is also .0039, but the

value from "qvalue" is 5.6e-10.

As well, "qvalue" (at FDR=0.01) is returning genes whose p-values are 
pretty big - e.g. p=0.12.  Partly this is because the estimated pi_0 is

just 7%.  By contrast, SAM estimates pi_0 to be 17% and returns a much

smaller list of genes at the same FDR.  These genes have unadjusted 
p-values which are quite small.

I guess if I believe SAM, I should be getting about 83% of my genes 
declared statistically significant - which, interestingly enough is
about 
what I do get at FDR=.01 from "qvalue".

As always, I welcome the insights of the members of this list.

--Naomi

Naomi S. Altman                                814-865-3791 (voice)
Associate Professor
Bioinformatics Consulting Center
Dept. of Statistics                              814-863-7114 (fax)
Penn State University                         814-865-1348
(Statistics)
University Park, PA 16802-2111

_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch 
https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor

______________________________________________________

The contents of this e-mail are privileged and/or confidenti...{{dropped}}