[BioC] DESeq variance question

Mon Dec 5 00:28:25 CET 2011

Dear Simon and Steffen,

> Date: Sat, 03 Dec 2011 20:36:10 +0100
> From: Simon Anders <anders at embl.de>
> To: Steffen Priebe <Steffen.Priebe at hki-jena.de>,
> 	bioconductor at r-project.org
> Subject: Re: [BioC] DESeq variance question
>
> Dear Steffen
>
> On 2011-12-02 13:53, Steffen Priebe wrote:

>> I was using DESeq (and edgeR) for differentially expression analysis. 
>> In my current dataset I compare 3 biological replicates of control vs. 
>> 3 biol. replicates from a mutant. The resulting 4 top genes according 
>> adjusted pvalue by DESeq and edgeR have a very high variance. (The 
>> reason for this is, that this are genes located on the chrY and only 
>> one replicate of the mutant was male)
>>
>> My question is now, how can genes with such a high variance of the 
>> counts result in this small pvalues? Is there any way to avoid this, 
>> because I think this are False Positives?
>>
>> Attached you can find the combined result table of DESeq and edgeR for 
>> the top 100 genes. The problem occurs for the first 4 genes. The raw 
>> counts are stated in columns P-U (P-R: Mutant, T-U Control).

...

> EdgeR, with its empirical Bayesian approach (implemented in its function 
> 'estimateTagwiseDispersion') should typically give p values in the 
> middle between DESeq's result using the 'maximum' and its the 
> 'fitted-only' sharing modes. However, at least in your case, edgeR 
> seemed to have stayed too close to the fitted values (or: to the 'common 
> dispersion', in edgeR's terminology)

Common dispersion is not edgeR terminology for DESeq's "fitted values", 
and (in the current Bioconductor release) edgeR moderates towards a local 
prior rather towards the common dispersion.  By default, edgeR does not 
fit any model to the dispersion, hence does not have fitted values. 
Instead it uses a prior based on locally weighted likelihood.

> as you wrote it also gave you p values for your high-variance genes that 
> you considered implausibly low.
>
>   Simon

We don't actually know whether tagwise dispersion was used in the edgeR 
analysis, nor have we seen the gene list (at least I haven't).  In the 
absence of the knowing either the analysis or the results, it would seem 
premature to make conclusions about the behaviour of estimateTagwiseDisp.

Best wishes
Gordon

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}