[Bioc-sig-seq] edgeR tagwise estimates not converging to common estimate with large prior.n value

Gordon K Smyth smyth at wehi.EDU.AU
Sat Sep 17 02:47:21 CEST 2011


Dear Sean,

The dispersion estimation functions in edgeR have a lower limit for the 
dispersions that they will estimate.  For estimateCommonDisp(), the lower 
limit is just above 0.0001.  For estimateTagwiseDisp() the lower limit is 
just above 0.001.  For your data, the ideal dispersion estimate appears to 
be zero, so the functions are simply returning to you the pre-set lower 
limits.

I agree that was a bit sloppy of us (the edgeR authors) for the lower 
limits to be inconsistent between the functions.  The reason for 
estimateTagwiseDisp() having a higher limit is that it does a grid search, 
so we wanted to limit the number of grid points for computational 
efficiency.

The new glm functions in edgeR, estimateGLMCommonDisp() etc have somewhat 
less restrictive lower limits than the classic functions that you are 
using.

The bottom line is that with technical data such as the yeast data, we do 
not view the differences between dispersion estimates of 1e-3 or 1e-4 as 
scientifically meaningful.  We would simply observe that the dispersion 
appears to be at the lower boundary, showing that the data has essentially 
no biological variability.  We would set the dispersions to be zero.

Best wishes
Gordon

> Date: Thu, 15 Sep 2011 18:03:28 -0700
> From: Sean Ruddy <sruddy17 at gmail.com>
> To: bioc-sig-sequencing at r-project.org
> Subject: [Bioc-sig-seq] edgeR tagwise estimates not converging to
> 	common estimate with large prior.n value
>
> Hi,
>
> Thanks in advance for any help. I have the latest R software (2.13.1) and
> edgeR software (2.8.4). I'm running into a problem where I estimate a common
> dispersion parameter of 0.0001 and when I subsequently estimate tagwise
> dispersions using the default prior.n = 10, the summary statistics are
>
> Min.  1st Qu.  Median    Mean    3rd Qu.    Max.
> 0.001  0.001      0.001     0.001     0.001      0.022
>
> ie, all estimates are 10 times larger than the common dispersion estimate.
> Since the method is supposed to shrink toward the common value this seems a
> little surprising. When I increase prior.n to a large number I expect the
> tagwise estimates to all converge to the common dispersion, but as you might
> guess from the table above it converges to 0.001 = 10*common.
>
> The data comes from the bioconductor package "yeastRNASeq" and it appears
> from the description of the data that the two samples in each group are
> actually from sequencing the same extraction of mRNA, ie not biological and
> not even really technical replicates. So the common dispersion should be
> zero as the counts should follow the poisson.
>
> I cannot explain the behavior of the estimates but I'm afraid it might be
> something in the code so I'll include that below.
>
> library(yeastRNASeq)
> data( geneLevelData )
> d <- DGEList( geneLevelData , group = c( rep( "Mutant" , 2 ) , rep( "Wild" ,
> 2 ) ) )
> d <- calcNormFactors( d )
> d <- d[rowSums(d$counts) >= 5, ]
> d <- estimateCommonDisp( d )
>
> d$common.dispersion
> [1] 0.000101
>
> d <- estimateTagwiseDisp( d , prior.n = 10 )
>
> summary( d$tagwise.dispersion )
>  Min. 1st Qu.  Median    Mean  3rd Qu.    Max.
> 0.001  0.001     0.001      0.001  0.001     0.022
>
> d <- estimateTagwiseDisp( d , prior.n = 1000 )
>
> summary( d$tagwise.dispersion )
> Min.    1st Qu.  Median    Mean   3rd Qu.    Max.
> 0.001   0.001     0.001      0.001   0.001     0.001
>
>
> It could just be an oddity of the data set itself but I don't have enough
> experience using edgeR across different RNA-Seq experiments to know how
> these methods should behave.
>
>
> Thanks,
> Sean

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioc-sig-sequencing mailing list