[BioC] conceptual question about FDR, FDR adjusted p-value and q-value
Gordon K Smyth
smyth at wehi.EDU.AU
Fri Dec 21 02:11:38 CET 2012
Dear Jack,
The thing to understand is that terms like FDR and q-value were defined in
specific ways by their original inventors but are used in more generic
ways by later researchers who adapt, modify or use the ideas.
The term "false discovery rate (FDR)" was created by Benjamini and
Hochberg in their 1995 paper. They gave a particular definition of what
they meant by FDR. Their procedure accepted or rejected hypotheses, but
did not produce adjusted p-values.
Benjamini and Yekutieli presented another more conservative algorithm to
control the FDR in a 2001 paper. Same definition of FDR, but a different
algorithm.
In 2002, I re-interpreted the Benjamini and Hochberg (BH) and Benjamini
and Yekutieli (BY) procedures in terms of adjusted p-values. I
implemented the resulting algorithms in the function p.adjust() in the
stats package, and used them in the limma package, and this lead to the
concept of an FDR adjusted p-value. The terminology used by the
p.adjust() function and limma packages has lead people to refer to "BH
adjusted p-values".
The adjusted p-value definition that you give is essentially the same as
the BH adjusted p-value, except that you omitted the last step in the
procedure. Your definition as it stands is not an increasing function of
the original p-values.
In 2002, John Storey created a new definition of "false discovery rate".
Storey's definition is based on Benjamini and Hochberg's original idea,
but is mathematically a bit more flexible. John Storey also created the
terminology "q-value" for a quantity estimates his definition of FDR. He
implemented q-value estimation procedures in an R package called qvalue.
So, strictly speaking, the q-value and the FDR adjusted p-value are
similar but not quite the same. However the terms q-value and FDR
adjusted p-value are often used generically by the Bioconductor community
to refer to any quantity that controls or estimates any definition of the
FDR. In this general sense the terms are synonyms.
The lesson to draw from this is that different methods and different
packages are trying to do slighty different things and give slightly
different results, and you should always cite the specific software and
method that you have used.
Best wishes
Gordon
> Date: Wed, 19 Dec 2012 10:22:23 -0500
> From: Jack Luo <jluo.rhelp at gmail.com>
> To: <bioconductor at stat.math.ethz.ch>
> Subject: [BioC] conceptual question about FDR, FDR adjusted p-value
> and q-value
>
> Hi,
>
> I am a bit confused about the concepts of the 3 things: FDR, FDR adjusted
> p-value and q-value, which I initially thought I was clear about.
>
> Are FDR adjusted p-value the same as q-value? (my understanding is that FDR
> adjusted p-value = original p-value * number of genes/rank of the gene, is
> that right?)
> When people say xxx genes are differentially expressed with an FDR cutoff
> of 0.05, does that mean xxx genes have an FDR adjusted p-value smaller than
> 0.05?
>
> Thanks,
>
> -Jack
>
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list