[BioC] qvalues, sam, limma
Marcus Davy
MDavy at hortresearch.co.nz
Thu Jun 10 02:01:07 CEST 2004
Hi,
I can make an observation about the differences in pi_0 estimation
between
the packages siggenes (SAM analysis) and qvalue.
> packageDescription("qvalue", field="Version")
[1] "1.0.3"
> packageDescription("siggenes", field="Version")
[1] "1.0.6"
>
The package siggenes uses the function "p0.est" which has the following
cubic spline code:
...
spline.out <- smooth.spline(vec.lambda, vec.p0, w = 1 - vec.lambda,
df = 3)
p0 <- min(predict(spline.out, 1)$y, 1).
...
The package qvalue uses the function "qvalue" which has the following
cubic spline code:
...
spi0 <- smooth.spline(lambda, pi0, df = 3)
pi0 <- predict(spi0, x = max(lambda))$y
...
The estimate pi_0(lambda) over a range of lambda observations becomes
unbiased as lambda -> 1, with a bias variance tradeoff.
The real difference is that observations are not weighted by 1-lambda
in the package qvalue, and the estimate of pi_0 is at pi_0(lambda=0.95)
by default in the arguements (lambda = seq(0, 0.95, 0.05)).
In the package siggenes, pi_0 is estimated at pi_0(lambda=1).
By weighting the observations with 1-lambda assumes values with small
lambda are trusted to be more accurate.
Details of the weighting of 1-lambda is in the preprint on page 13 of:
Storey J. D. and Tibshirani R. J. Statistical significance for
genome-wide
experiments. Preprint, January 2003.
I have run simulations that suggest that including the 1-lambda weights
can improve the variance in the estimation of pi_0(lambda) over the
entire
range of pi_0.
marcus
>>> Naomi Altman <naomi at stat.psu.edu> 10/06/2004 9:16:00 AM >>>
I have an Affy experiment with a very high level of differential
expression. It is a one-way ANOVA with 6 treatments, 2 replicates per
treatment.
We ran both SAM (excel version) and limma, and had very good agreement
between them in terms of ranking the genes by the test statistic. For
any
set of the top K genes, over 90% of the genes were identified by both
routines.
SAM automatically produces a q-values and estimates FDR and pi_0 (the
percentage of non-differentially expressing genes). I used the
Bioconductor package "qvalue" to convert the limma p-values to
q-values. Both routines are supposed to be based on the same paper.
But
the SAM q-value for the most highly differentially expressed gene is
.0039,
whereas the q-value from "qvalue" is 3.9e-12. The SAM q-value for the
1000th most highly differentially expressed gene is also .0039, but the
value from "qvalue" is 5.6e-10.
As well, "qvalue" (at FDR=0.01) is returning genes whose p-values are
pretty big - e.g. p=0.12. Partly this is because the estimated pi_0 is
just 7%. By contrast, SAM estimates pi_0 to be 17% and returns a much
smaller list of genes at the same FDR. These genes have unadjusted
p-values which are quite small.
I guess if I believe SAM, I should be getting about 83% of my genes
declared statistically significant - which, interestingly enough is
about
what I do get at FDR=.01 from "qvalue".
As always, I welcome the insights of the members of this list.
--Naomi
Naomi S. Altman 814-865-3791 (voice)
Associate Professor
Bioinformatics Consulting Center
Dept. of Statistics 814-863-7114 (fax)
Penn State University 814-865-1348
(Statistics)
University Park, PA 16802-2111
_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
______________________________________________________
The contents of this e-mail are privileged and/or confidenti...{{dropped}}
More information about the Bioconductor
mailing list