[BioC] edgeR and FDR

Gordon K Smyth smyth at wehi.EDU.AU
Mon Jun 28 01:12:30 CEST 2010


Hi Naomi,

edgeR already does exactly what you suggest, although we chose p=0.05 
(leading to K=5) for this purpose rather than 0.001.  You're right that a 
more conservative value would probably be better.  However all the NextGen 
data sets we've analysed so far have huge amounts of DE, so it hasn't been 
an issue.

Regards
Gordon

On Sat, 26 Jun 2010, Naomi Altman wrote:

>
> Basically, if a global FDR is used with discrete data, then one should filter 
> low expressing genes pretty stringently.  For example, one could compute K 
> (the marginal total for the gene) for which the smallest possible p-value is 
> .001 (e.g. use the ordinary Fisher's exact test as an approximation) and use 
> only features with K or more reads in the study.  This improves power for the 
> (much smaller number of) remaining features, but obviously you will then need 
> to sort manually through the low expressing genes to determine if you have 
> missed something striking (such as all of the K-1 reads are in a single 
> sample).
>
> --Naomi
>
>
>
> At 10:39 AM 6/26/2010, you wrote:
>> Hi Naomi,
>> 
>> I agree that the discreteness of the counts introduces conservatism, and 
>> that there is a power differential between low and high expressed genes. 
>> However the expected overall FDR is still controlled at a rate less than or 
>> equal to the nominal rate, and that is all we promise.
>> 
>> To reduce the trend in DE vs expression level, I like to combine FDR with a 
>> fold-change cutoff or, perhaps better, use a TREAT like test.
>> 
>> Regards
>> Gordon
>> 
>> On Sat, 26 Jun 2010, Naomi Altman wrote:
>> 
>>> Dear Gordon,
>>> Thank you for your very detailed and clear answer to my question about the 
>>> dispersion model.
>>> 
>>> Regarding FDR:
>>> For discrete-valued test statistics, the distribution of the p-values 
>>> under the null hypothesis is a discrete uniform which depends on the 
>>> marginal total.  As a result,
>>> under the distribution of p-values from the null hypotheses is a mixture 
>>> of discrete uniforms, which can be marginally very non-uniform.  Even 
>>> after filtering out low expressing genes, it is common to see a peak of 
>>> p-values near 1.0 due to this effect.  It is less evident that there are 
>>> multiple other peaks, one at each of the discrete values of the p-value 
>>> for each marginal total.  The result of this is that FDR computations are 
>>> far too conservative for lowly expressing genes, and far too liberal for 
>>> highly expressing genes which basically magnifies the power differential 
>>> that already exists due to the relationship between the mean and variance.
>>> 
>>> --Naomi
>>> 
>>> At 05:01 AM 6/26/2010, Gordon K Smyth wrote:
>>>> Dear Zhe,
>>>> To get FDR, you must use the topTags() function.  Is your de.com object a 
>>>> deDGEList object?  If it is, then
>>>>
>>>>   top <- topTags(de.com, n=Inf)
>>>>   write.table(top$table, file="yourfile.txt")
>>>> will do what you want.  (I can't tell you what level of FDR to use as 
>>>> your cutoff though, that's up to you.)
>>>> Naomi, I don't know of any problem with FDR from edgeR.  It should work 
>>>> just fine.
>>>> Best wishes
>>>> Gordon
>>>> -----------------------------------------------
>>>> Associate Professor Gordon K Smyth,
>>>> NHMRC Senior Research Fellow,
>>>> Bioinformatics Division, Walter and Eliza Hall Institute of Medical 
>>>> Research, 1G Royal Parade, Parkville, Vic 3052, Australia.
>>>> smyth at wehi.edu.au
>>>> http://www.wehi.edu.au
>>>> http://www.statsci.org/smyth
>>>> 
>>>> ------------ original message ---------------
>>>> [BioC] edgeR question
>>>> Naomi Altman naomi at stat.psu.edu
>>>> Fri Jun 25 22:43:51 CEST 2010
>>>> Hi Zhe,
>>>> 1. First normalize and then do the DE
>>>> analysis.  (I found this confusing in the vignette, too.)
>>>> 2. I do not suggest using FDR at this time.  The
>>>> standard FDR computations need to be adjusted for
>>>> count data.  I do not think this has been worked out yet.
>>>> --Naomi
>>>> 
>>>> At 12:21 PM 6/25/2010,  wrote:
>>>> 
>>>>> Hello,
>>>>> I am learning edgeR and would like to use it
>>>>> dealing with my Tag-seq and RNA-seq data. I have several questions:
>>>>> 1. Does the DE analysis using common
>>>>> dispersion or moderated tagwise dispersions use
>>>>> the TMM method for normalization?  I am not
>>>>> sure the relationship between Setion 6
>>>>> (Normalization) and the following sections in
>>>>> the user manual. I suppose I should normalize
>>>>> the data first, and then perform DE analysis.
>>>>> 2. Do you suggest to use P-value < 0.01? What
>>>>> about FDR < 0.05? After saving de.tagwise (>
>>>>> write.table(de.com[[1]], file =
>>>>> "/Users/Zhe/edgeR/page7", sep = "\t")), I found
>>>>> there is not a column of the FDR. How to
>>>>> calculate the FDR for each gene and save it in the output file.
>>>>> Thanks a lot.
>>>>> Best wishes,
>>>>> Zhe
>>>> ______________________________________________________________________
>>>> The information in this email is confidential and intend...{{dropped:4}}
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives: 
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>> 
>>> Naomi S. Altman                                814-865-3791 (voice)
>>> Associate Professor
>>> Dept. of Statistics                              814-863-7114 (fax)
>>> Penn State University                         814-865-1348 (Statistics)
>>> University Park, PA 16802-2111
>>> 
>> 
>> ______________________________________________________________________
>> The information in this email is confidential and intend...{{dropped:4}}
>> 
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: 
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> Naomi S. Altman                                814-865-3791 (voice)
> Associate Professor
> Dept. of Statistics                              814-863-7114 (fax)
> Penn State University                         814-865-1348 (Statistics)
> University Park, PA 16802-2111
>
>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list