[BioC] Q-values way smaller than P-values! Is that kosher?

Mon Jan 13 00:39:16 CET 2014

I generally use BH myself for any number of tests (my packages limma and 
edgeR use BH), because I don't feel that estimation of pi0 can be made 
bullet proof even for large numbers of tests.

Best wishes
Gordon

On Sun, 12 Jan 2014, Joshua Banta wrote:

> Thank you, Dr. Smyth. I am glad to know this. Very helpful.
>
> At what number of P-values would you suggest it is "safe" to use 
> Storey's Q-value method? For instance, what if I had 300 P-values? What 
> if I had 100? What if I had 1000?
>
> On Jan 12, 2014, at 2:34 AM, Gordon K Smyth <smyth at wehi.EDU.AU<mailto:smyth at wehi.EDU.AU>> wrote:
>
> Hi Josh,
>
> Of course it's not statistically defensible, and you can't use these 
> qvalues.  As Mike Love points out, qvalue just isn't intended for small 
> sets of p-values like this.
>
> The main problem is that qvalue's estimate of pi0 (the proportion of 
> true nulls) is not reliable for small numbers of tests.  For your data, 
> qvalue estimates pi0 to be
>
> > qvalue(p.values)$pi0
> [1] 0.05057643
>
> which is equivalent to saying that most likely all the null hypotheses should be rejected.
>
> Some other methods of estimating pi0 are more robust.  For example the limma function propTrueNull gives pi0=0.75 for the same data:
>
> > library(limma)
> > propTrueNull(p.values)
> [1] 0.7506187
>
> The qvalue method of estimating pi0 is sensitive to the size of the largest p.values.  If you changed just the 0.79 p-value to 0.99, all the qvalues would look very different:
>
> > data.frame(p.values,q.values=qvalue(p.values)$qvalues)
>     p.values  q.values
> 1 0.26684513 0.5509513
> 2 0.35367528 0.5509513
> 3 0.03058514 0.1330221
> 4 0.99900000 0.7500975
> 5 0.60805832 0.5870052
> 6 0.43346672 0.5509513
> 7 0.03936944 0.1330221
> 8 0.48918122 0.5509513
> 9 0.74062659 0.6256105
>
> For very small sets of p-values, it seems to me to be safer to use the Benjamini and Hochberg algorithm (available in the p.adjust function). BH is more conservative than qvalue, but it doesn't rely on an estimator for pi0 and is trustworthy for any number of tests.
>
> Best wishes
> Gordon
>
>
> On Sat, 11 Jan 2014, Michael Love wrote:
>
> hi Joshua,
>
> While you wait for a definitive answer from the package maintainer, note
> that the methods in the package are designed for datasets with number of
> tests, m, in the thousands.
>
> for example, the reference mentions:
>
> "Because we are considering many features (i.e., *m* is very large), it can
> be shown that..."
>
> http://www.ncbi.nlm.nih.gov/pmc/articles/PMC170937/
>
>
> Mike
>
>
>
>
> On Fri, Jan 10, 2014 at 1:08 PM, Joshua Banta <jbanta at uttyler.edu> wrote:
>
> Dear listserv,
>
> Consider the following vector of P-values:
>
> p.values <- c(
> 0.266845129,
> 0.353675275,
> 0.030585143,
> 0.791285527,
> 0.60805832,
> 0.433466716,
> 0.039369439,
> 0.48918122,
> 0.740626586
> )
>
> When I run the qvalue function of the qvalue package, I get a rather
> surprising outcome: the q-values are much smaller than the corresponding
> p-values!
>
> See the following code below:
>
> source("http://bioconductor.org/biocLite.R")
> biocLite("qvalue")
> library(qvalue)
>
> q.values <- qvalue(p.values)$qvalue
>
> comparison <- cbind(p.values, q.values)
>
> colnames(comparison) <- c("p.values", "q.values")
>
> comparison
>
>       p.values    q.values
> [1,] 0.26684513 0.037111560
> [2,] 0.35367528 0.037111560
> [3,] 0.03058514 0.008960246
> [4,] 0.79128553 0.040020397
> [5,] 0.60805832 0.039540110
> [6,] 0.43346672 0.037111560
> [7,] 0.03936944 0.008960246
> [8,] 0.48918122 0.037111560
> [9,] 0.74062659 0.040020397
>
> So what is happening here? Can I trust the q-values? In each case they are
> substantially larger than the P-values. Especially troubling to me are rows
> 4, 5, and 9.
>
> Thanks in advance for any information/advice!
> -----------------------------------
> Josh Banta, Ph.D
> Assistant Professor
> Department of Biology
> The University of Texas at Tyler
> Tyler, TX 75799
> Tel: (903) 565-5655
> http://plantevolutionaryecology.org
>
> ______________________________________________________________________
> The information in this email is confidential and inte...{{dropped:21}}