[BioC] edgeR: common vs tagwise dispersion

Ann Hess hess at stat.colostate.edu
Tue Apr 20 16:18:24 CEST 2010


I am using edgeR to look for differentially abundant “segments”
between two groups (data generated using high throughput sequencing).
 I have 3 (pooled) biological reps per group and a total of 18760
segments (83 rows with zero count are removed by edger).

As a first approach, I used the common dispersion method and found the
estimated common dispersion to be 0.135.  After looking at the top 10
segments, I find that there tends to be a single large value
(different for each segment) that is bringing up the logFC.

I tried using moderated tagwise dispersion (using prior.n=50 and 25)
and found that the results are largely the same as common dispersion
approach (not shown).  When I look at the tagwise dispersion values
for the top 10 hits, I find that the estimated tagwise dispersion
values are greater that the estimated common dispersion (not shown).

To look into things further, I ran the same analysis but now with
prior.n=0 (no moderation/squeezing).  The top 10 hits are now
completely different and the estimated tagwise dispersion values for
the top 10 are very small.  (Looking at the top 10 seems to suggest
that I could use a Poisson distribution.)

Questions:
1.	Should I be concerned that the results are so different depending
on whether common dispersion (almost equivalent to moderated tagwise
dispersion) or no-moderation tagwise dispersion is used?  Based on
FDR<0.05, there is only about 10% overlap between the two approaches.
2.	I’m not sure how to interpret the tagwise dispersion values for the
top hits: common dispersion method picks up segments with large
tagwise dispersion, no moderation method picks up segments with small
tagwise dispersion.

I am using edgeR_1.4.7 with R version 2.10.1.

> #COMMON DISPERSION APPROACH
> library(edgeR)
> df <- DGEList(counts=Reads, group=c(0,0,0,1,1,1), genes=Annotation$Description)
> df$samples
      group lib.size
C1        0  4488940
C2        0  2437107
C3        0  2600316
T1        1  3935852
T2        1  3806079
T3        1  3913694
> df <- estimateCommonDisp(df)
> df$common.dispersion
[1] 0.1346658
> df.com<-exactTest(df)
Comparison of groups:  1 – 0
> CDtop10<-topTags(df.com)$table
> CDtop10[,-1]
     logConc     logFC       PValue          FDR
6145  -12.77011  7.490945 2.002637e-37 3.740325e-33
15580 -12.32428  6.621865 2.854360e-32 2.665544e-28
1565  -12.21365  6.311500 2.936737e-30 1.828315e-26
15718 -13.94448 -6.050904 6.517136e-28 3.043014e-24
1154  -13.69624 -5.326718 1.143794e-23 4.272527e-20
15630 -17.02012  5.975869 1.743145e-21 5.426120e-18
341   -18.60859 -6.565039 4.125655e-19 1.100784e-15
16351 -14.64956  4.565285 1.375746e-18 3.211850e-15
6468  -15.86990 -4.712918 3.248713e-18 6.741802e-15
4891  -16.96347  5.181179 7.436516e-18 1.388918e-14
> CDtopIDs<-as.numeric(row.names(CDtop10))
> df$counts[CDtopIDs,]
       C1   C2   C3    T1    T2    T3
 [1,]  31   36   28    45    57 22440
 [2,]  77   45   61 22745    47    55
 [3,]  49   68   85    76   210 21738
 [4,]  35 3729   34    37    22    32
 [5,]  28   69 3636    25    25    89
 [6,]   4    2    3     7    25   668
 [7,]   2    3  188     1     0     2
 [8,]  51    8   23   301   296  1619
 [9,]  12   10  652    14    13    11
[10,]   4    5    3   540     6    10


> #TAGWISE DISPERSION APPROACH
> fprior <- estimateSmoothing(df)
> fprior
[1] 6329.643

#I also tried prior.n=25 and prior.n=50, but results not shown.
> df<-estimateTagwiseDisp(df, prior.n = 0)
> quantile(df$tagwise.dispersion)
          0%          25%          50%          75%         100%
1.001001e-03 2.151338e-02 7.667087e-02 1.715599e-01 9.990000e+02
> df.tgw<-exactTest(df,common.disp=FALSE)
> TGWtop10<-topTags(df.tgw)$table
> TGWtop10[,-1]
        logConc      logFC       PValue          FDR
2659  -13.14601 -0.7454279 2.722274e-25 5.084391e-21
11865 -14.21354 -1.4925866 5.769090e-22 5.387465e-18
13066 -16.14612 -1.3689788 2.176835e-15 1.355225e-11
12381 -15.12798 -0.9772265 5.676831e-15 2.650654e-11
17206 -17.37537 -2.0234616 1.808245e-14 5.722016e-11
1172  -13.15737 -0.5535657 1.838202e-14 5.722016e-11
8678  -15.78098 -1.4312623 5.231726e-13 1.395899e-09
251   -15.03996 -0.8806583 1.137238e-12 2.655024e-09
8466  -14.50024 -0.7195308 1.861731e-12 3.863507e-09
8472  -15.35444 -0.9235374 5.519857e-12 1.030944e-08
> TGWtopIDs<-as.numeric(row.names(TGWtop10))
> df$counts[TGWtopIDs,]
     C1   C2   C3    T1    T2    T3
 [1,] 685 346 337   340   340   313
 [2,] 382 199 256   121   103   142
 [3,]  97  59  55    29    36    35
 [4,] 171  91 111    87    73    72
 [5,]  54  24  35    12     8    14
 [6,] 629 295 345   354   352   347
 [7,] 138  58  84    41    47    38
 [8,] 200  89  96    93    75    87
 [9,] 231 138 157   149   115   128
[10,] 145  87  81    74    75    53

> df$tagwise.dispersion[TGWtopIDs]
 [1] 0.001001001 0.011153172 0.001001001 0.001001001 0.001001001
 [6] 0.001001001 0.011153172 0.001001001 0.001001001 0.001001001
> df$tagwise.dispersion[CDtopIDs]
 [1] 3.1369561 3.0528706 2.6123364 2.4261316 2.4261316 1.8815105
 [7] 3.2246046 0.6054731 2.0582920 2.1059294



More information about the Bioconductor mailing list