[BioC] edgeR and FDR
Gordon K Smyth
smyth at wehi.EDU.AU
Sat Jun 26 16:39:01 CEST 2010
Hi Naomi,
I agree that the discreteness of the counts introduces conservatism, and
that there is a power differential between low and high expressed genes.
However the expected overall FDR is still controlled at a rate less than
or equal to the nominal rate, and that is all we promise.
To reduce the trend in DE vs expression level, I like to combine FDR with
a fold-change cutoff or, perhaps better, use a TREAT like test.
Regards
Gordon
On Sat, 26 Jun 2010, Naomi Altman wrote:
> Dear Gordon,
> Thank you for your very detailed and clear answer to my question about the
> dispersion model.
>
> Regarding FDR:
> For discrete-valued test statistics, the distribution of the p-values under
> the null hypothesis is a discrete uniform which depends on the marginal
> total. As a result,
> under the distribution of p-values from the null hypotheses is a mixture of
> discrete uniforms, which can be marginally very non-uniform. Even after
> filtering out low expressing genes, it is common to see a peak of p-values
> near 1.0 due to this effect. It is less evident that there are multiple
> other peaks, one at each of the discrete values of the p-value for each
> marginal total. The result of this is that FDR computations are far too
> conservative for lowly expressing genes, and far too liberal for highly
> expressing genes which basically magnifies the power differential that
> already exists due to the relationship between the mean and variance.
>
> --Naomi
>
> At 05:01 AM 6/26/2010, Gordon K Smyth wrote:
>> Dear Zhe,
>>
>> To get FDR, you must use the topTags() function. Is your de.com object a
>> deDGEList object? If it is, then
>>
>> top <- topTags(de.com, n=Inf)
>> write.table(top$table, file="yourfile.txt")
>>
>> will do what you want. (I can't tell you what level of FDR to use as your
>> cutoff though, that's up to you.)
>>
>> Naomi, I don't know of any problem with FDR from edgeR. It should work
>> just fine.
>>
>> Best wishes
>> Gordon
>>
>> -----------------------------------------------
>> Associate Professor Gordon K Smyth,
>> NHMRC Senior Research Fellow,
>> Bioinformatics Division, Walter and Eliza Hall Institute of Medical
>> Research, 1G Royal Parade, Parkville, Vic 3052, Australia.
>> smyth at wehi.edu.au
>> http://www.wehi.edu.au
>> http://www.statsci.org/smyth
>>
>>
>>
>> ------------ original message ---------------
>> [BioC] edgeR question
>> Naomi Altman naomi at stat.psu.edu
>> Fri Jun 25 22:43:51 CEST 2010
>>
>> Hi Zhe,
>> 1. First normalize and then do the DE
>> analysis. (I found this confusing in the vignette, too.)
>>
>> 2. I do not suggest using FDR at this time. The
>> standard FDR computations need to be adjusted for
>> count data. I do not think this has been worked out yet.
>>
>> --Naomi
>>
>>
>> At 12:21 PM 6/25/2010, wrote:
>>
>>> Hello,
>>>
>>> I am learning edgeR and would like to use it
>>> dealing with my Tag-seq and RNA-seq data. I have several questions:
>>>
>>> 1. Does the DE analysis using common
>>> dispersion or moderated tagwise dispersions use
>>> the TMM method for normalization? I am not
>>> sure the relationship between Setion 6
>>> (Normalization) and the following sections in
>>> the user manual. I suppose I should normalize
>>> the data first, and then perform DE analysis.
>>>
>>> 2. Do you suggest to use P-value < 0.01? What
>>> about FDR < 0.05? After saving de.tagwise (>
>>> write.table(de.com[[1]], file =
>>> "/Users/Zhe/edgeR/page7", sep = "\t")), I found
>>> there is not a column of the FDR. How to
>>> calculate the FDR for each gene and save it in the output file.
>>>
>>> Thanks a lot.
>>> Best wishes,
>>>
>>> Zhe
>>
>> ______________________________________________________________________
>> The information in this email is confidential and intend...{{dropped:4}}
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> Naomi S. Altman 814-865-3791 (voice)
> Associate Professor
> Dept. of Statistics 814-863-7114 (fax)
> Penn State University 814-865-1348 (Statistics)
> University Park, PA 16802-2111
>
>
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list