[BioC] limma - FDR adjusted "p-values"

Sean Davis sdavis2 at mail.nih.gov
Tue Feb 1 14:05:02 CET 2005


On Feb 1, 2005, at 7:30 AM, Gordon K Smyth wrote:

>> Date: Mon, 31 Jan 2005 09:56:09 -0500
>> From: Naomi Altman <naomi at stat.psu.edu>
>> Subject: [BioC] limma - FDR adjusted "p-values"
>> To: bioconductor at stat.math.ethz.ch
>>
>> Just a suggestion:
>>
>> The FDR adjusted "p-values" are called "q-values" in much of the
>> literature.  I suggest that limma follow suit,
>
> It's certainly true that a lot of users have trouble with FDR and with 
> adjusted p-values in
> general.  Perhaps you're right that limma should use the term 
> "q-values".  This would associate
> p-values with control/estimation of FWER and q-values with 
> control/estimation of FDR.
>
> The reason I haven't this so far is because the term "q-value" coined 
> by John Storey seems to me
> to measure something slightly different to Benjamini and Hocherg 
> adjusted p-values.  I think that
> John Storey's q-value uses a slightly different definition of false 
> discovery rate, namely pFDR,
> the positive false rate.  Also I think it usually estimates pFDR 
> rather than formally controlling
> it.  Although there is a value "Q" which appears in Benjamin and 
> Hochberg's formulations, and it
> is closely related to q-values, it is not exactly the same.   So I 
> have been reluctant to use the
> term "q-value" for things which were not quite the same, as this would 
> cloud the fine meaning of
> the term.  Perhaps I am splitting hairs here and should just accept 
> the broad definition of
> q-value for FDR or pFDR and p-value for FWER.  Any other opinions?
>
> I have also thought that perhaps topTable() should label the 
> p-value/q-value column in the output
> to indicate which adjustment method was used to generate the table.
>

I think the latter (label the p-value/q-value column) would suffice and 
be the most general solution.  Unfortunately, FDR is foreign to many 
researchers, so it demands an explanation by someone in-the-know, no 
matter what.  I'm not sure that calling a p-value a different name will 
satisfy the need for researchers to know the quantity that summarizes 
their data.  In short, I see the labeling issue as separate from the 
FDR understanding issue.  Is that fair?

Sean


>> and also add a line to the
>> documentation (it might already be there and I missed it)
>>
>> "If the number of significant results at level alpha is less than
>> alpha*(number of genes), then the q-value will be 1.0."
>>
>> It seems like I have to explain this to just about every investigator 
>> who
>> runs into this.
>
> I get a lot of questions about this as well.  Actually, the statement 
> you've made isn't always
> true, although it usually is.  Even if the smallest p-value out of n 
> genes is only as small as
> 1/n, the "fdr" adjusted p-value is not always 1.  It can be as small 
> as 1/n depending on the other
> n-1 p-values.
>
> Perhaps the way to go would be for topTable() to output the raw 
> p-values as well as the adjusted
> p-values/q-values.  I haven't done this so as to keep the table as 
> small as possible, but it would
> prevent users from being presented with just a list of p-values all 
> equal to 1.  What do you
> think?
>
> Gordon
>
>> Naomi S. Altman                                814-865-3791 (voice)
>> Associate Professor
>> Bioinformatics Consulting Center
>> Dept. of Statistics                              814-863-7114 (fax)
>> Penn State University                         814-865-1348 
>> (Statistics)
>> University Park, PA 16802-2111
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor



More information about the Bioconductor mailing list