[BioC] Q-values way smaller than P-values! Is that kosher?
Gordon K Smyth
smyth at wehi.EDU.AU
Sun Jan 12 09:34:20 CET 2014
Hi Josh,
Of course it's not statistically defensible, and you can't use these
qvalues. As Mike Love points out, qvalue just isn't intended for small
sets of p-values like this.
The main problem is that qvalue's estimate of pi0 (the proportion of true
nulls) is not reliable for small numbers of tests. For your data, qvalue
estimates pi0 to be
> qvalue(p.values)$pi0
[1] 0.05057643
which is equivalent to saying that most likely all the null hypotheses
should be rejected.
Some other methods of estimating pi0 are more robust. For example the
limma function propTrueNull gives pi0=0.75 for the same data:
> library(limma)
> propTrueNull(p.values)
[1] 0.7506187
The qvalue method of estimating pi0 is sensitive to the size of the
largest p.values. If you changed just the 0.79 p-value to 0.99, all the
qvalues would look very different:
> data.frame(p.values,q.values=qvalue(p.values)$qvalues)
p.values q.values
1 0.26684513 0.5509513
2 0.35367528 0.5509513
3 0.03058514 0.1330221
4 0.99900000 0.7500975
5 0.60805832 0.5870052
6 0.43346672 0.5509513
7 0.03936944 0.1330221
8 0.48918122 0.5509513
9 0.74062659 0.6256105
For very small sets of p-values, it seems to me to be safer to use the
Benjamini and Hochberg algorithm (available in the p.adjust function).
BH is more conservative than qvalue, but it doesn't rely on an estimator
for pi0 and is trustworthy for any number of tests.
Best wishes
Gordon
On Sat, 11 Jan 2014, Michael Love wrote:
> hi Joshua,
>
> While you wait for a definitive answer from the package maintainer, note
> that the methods in the package are designed for datasets with number of
> tests, m, in the thousands.
>
> for example, the reference mentions:
>
> "Because we are considering many features (i.e., *m* is very large), it can
> be shown that..."
>
> http://www.ncbi.nlm.nih.gov/pmc/articles/PMC170937/
>
>
> Mike
>
>
>
>
> On Fri, Jan 10, 2014 at 1:08 PM, Joshua Banta <jbanta at uttyler.edu> wrote:
>
>> Dear listserv,
>>
>> Consider the following vector of P-values:
>>
>> p.values <- c(
>> 0.266845129,
>> 0.353675275,
>> 0.030585143,
>> 0.791285527,
>> 0.60805832,
>> 0.433466716,
>> 0.039369439,
>> 0.48918122,
>> 0.740626586
>> )
>>
>> When I run the qvalue function of the qvalue package, I get a rather
>> surprising outcome: the q-values are much smaller than the corresponding
>> p-values!
>>
>> See the following code below:
>>
>> source("http://bioconductor.org/biocLite.R")
>> biocLite("qvalue")
>> library(qvalue)
>>
>> q.values <- qvalue(p.values)$qvalue
>>
>> comparison <- cbind(p.values, q.values)
>>
>> colnames(comparison) <- c("p.values", "q.values")
>>
>> comparison
>>
>> p.values q.values
>> [1,] 0.26684513 0.037111560
>> [2,] 0.35367528 0.037111560
>> [3,] 0.03058514 0.008960246
>> [4,] 0.79128553 0.040020397
>> [5,] 0.60805832 0.039540110
>> [6,] 0.43346672 0.037111560
>> [7,] 0.03936944 0.008960246
>> [8,] 0.48918122 0.037111560
>> [9,] 0.74062659 0.040020397
>>
>> So what is happening here? Can I trust the q-values? In each case they are
>> substantially larger than the P-values. Especially troubling to me are rows
>> 4, 5, and 9.
>>
>> Thanks in advance for any information/advice!
>> -----------------------------------
>> Josh Banta, Ph.D
>> Assistant Professor
>> Department of Biology
>> The University of Texas at Tyler
>> Tyler, TX 75799
>> Tel: (903) 565-5655
>> http://plantevolutionaryecology.org
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list