[BioC] limma - FDR adjusted "p-values"

Tue Feb 1 21:12:52 CET 2005

I think it would be useful to have both the p-values and the 
"q-values".  The "q-values" should not be called "adjusted p-values" 
because they are not probabilities.  They are the estimated FDR at the 
largest p-value for which the gene would be statistically 
significant.  Perhaps they should be called "fdr-values".

My vote is for Gordon to invent a name and then use it.  As LIMMA becomes 
more popular, the terminology will migrate to popular usage.

Cheers,
Naomi

At 07:30 AM 2/1/2005, Gordon K Smyth wrote:
> > Date: Mon, 31 Jan 2005 09:56:09 -0500
> > From: Naomi Altman <naomi at stat.psu.edu>
> > Subject: [BioC] limma - FDR adjusted "p-values"
> > To: bioconductor at stat.math.ethz.ch
> >
> > Just a suggestion:
> >
> > The FDR adjusted "p-values" are called "q-values" in much of the
> > literature.  I suggest that limma follow suit,
>
>It's certainly true that a lot of users have trouble with FDR and with 
>adjusted p-values in
>general.  Perhaps you're right that limma should use the term 
>"q-values".  This would associate
>p-values with control/estimation of FWER and q-values with 
>control/estimation of FDR.
>
>The reason I haven't this so far is because the term "q-value" coined by 
>John Storey seems to me
>to measure something slightly different to Benjamini and Hocherg adjusted 
>p-values.  I think that
>John Storey's q-value uses a slightly different definition of false 
>discovery rate, namely pFDR,
>the positive false rate.  Also I think it usually estimates pFDR rather 
>than formally controlling
>it.  Although there is a value "Q" which appears in Benjamin and 
>Hochberg's formulations, and it
>is closely related to q-values, it is not exactly the same.   So I have 
>been reluctant to use the
>term "q-value" for things which were not quite the same, as this would 
>cloud the fine meaning of
>the term.  Perhaps I am splitting hairs here and should just accept the 
>broad definition of
>q-value for FDR or pFDR and p-value for FWER.  Any other opinions?
>
>I have also thought that perhaps topTable() should label the 
>p-value/q-value column in the output
>to indicate which adjustment method was used to generate the table.
>
> > and also add a line to the
> > documentation (it might already be there and I missed it)
> >
> > "If the number of significant results at level alpha is less than
> > alpha*(number of genes), then the q-value will be 1.0."
> >
> > It seems like I have to explain this to just about every investigator who
> > runs into this.
>
>I get a lot of questions about this as well.  Actually, the statement 
>you've made isn't always
>true, although it usually is.  Even if the smallest p-value out of n genes 
>is only as small as
>1/n, the "fdr" adjusted p-value is not always 1.  It can be as small as 
>1/n depending on the other
>n-1 p-values.
>
>Perhaps the way to go would be for topTable() to output the raw p-values 
>as well as the adjusted
>p-values/q-values.  I haven't done this so as to keep the table as small 
>as possible, but it would
>prevent users from being presented with just a list of p-values all equal 
>to 1.  What do you
>think?
>
>Gordon
>
> > Naomi S. Altman                                814-865-3791 (voice)
> > Associate Professor
> > Bioinformatics Consulting Center
> > Dept. of Statistics                              814-863-7114 (fax)
> > Penn State University                         814-865-1348 (Statistics)
> > University Park, PA 16802-2111

Naomi S. Altman                                814-865-3791 (voice)
Associate Professor
Bioinformatics Consulting Center
Dept. of Statistics                              814-863-7114 (fax)
Penn State University                         814-865-1348 (Statistics)
University Park, PA 16802-2111