[BioC] edgeR: common vs tagwise dispersion
Ann Hess
hess at stat.colostate.edu
Tue Apr 20 16:18:24 CEST 2010
I am using edgeR to look for differentially abundant “segments”
between two groups (data generated using high throughput sequencing).
I have 3 (pooled) biological reps per group and a total of 18760
segments (83 rows with zero count are removed by edger).
As a first approach, I used the common dispersion method and found the
estimated common dispersion to be 0.135. After looking at the top 10
segments, I find that there tends to be a single large value
(different for each segment) that is bringing up the logFC.
I tried using moderated tagwise dispersion (using prior.n=50 and 25)
and found that the results are largely the same as common dispersion
approach (not shown). When I look at the tagwise dispersion values
for the top 10 hits, I find that the estimated tagwise dispersion
values are greater that the estimated common dispersion (not shown).
To look into things further, I ran the same analysis but now with
prior.n=0 (no moderation/squeezing). The top 10 hits are now
completely different and the estimated tagwise dispersion values for
the top 10 are very small. (Looking at the top 10 seems to suggest
that I could use a Poisson distribution.)
Questions:
1. Should I be concerned that the results are so different depending
on whether common dispersion (almost equivalent to moderated tagwise
dispersion) or no-moderation tagwise dispersion is used? Based on
FDR<0.05, there is only about 10% overlap between the two approaches.
2. I’m not sure how to interpret the tagwise dispersion values for the
top hits: common dispersion method picks up segments with large
tagwise dispersion, no moderation method picks up segments with small
tagwise dispersion.
I am using edgeR_1.4.7 with R version 2.10.1.
> #COMMON DISPERSION APPROACH
> library(edgeR)
> df <- DGEList(counts=Reads, group=c(0,0,0,1,1,1), genes=Annotation$Description)
> df$samples
group lib.size
C1 0 4488940
C2 0 2437107
C3 0 2600316
T1 1 3935852
T2 1 3806079
T3 1 3913694
> df <- estimateCommonDisp(df)
> df$common.dispersion
[1] 0.1346658
> df.com<-exactTest(df)
Comparison of groups: 1 – 0
> CDtop10<-topTags(df.com)$table
> CDtop10[,-1]
logConc logFC PValue FDR
6145 -12.77011 7.490945 2.002637e-37 3.740325e-33
15580 -12.32428 6.621865 2.854360e-32 2.665544e-28
1565 -12.21365 6.311500 2.936737e-30 1.828315e-26
15718 -13.94448 -6.050904 6.517136e-28 3.043014e-24
1154 -13.69624 -5.326718 1.143794e-23 4.272527e-20
15630 -17.02012 5.975869 1.743145e-21 5.426120e-18
341 -18.60859 -6.565039 4.125655e-19 1.100784e-15
16351 -14.64956 4.565285 1.375746e-18 3.211850e-15
6468 -15.86990 -4.712918 3.248713e-18 6.741802e-15
4891 -16.96347 5.181179 7.436516e-18 1.388918e-14
> CDtopIDs<-as.numeric(row.names(CDtop10))
> df$counts[CDtopIDs,]
C1 C2 C3 T1 T2 T3
[1,] 31 36 28 45 57 22440
[2,] 77 45 61 22745 47 55
[3,] 49 68 85 76 210 21738
[4,] 35 3729 34 37 22 32
[5,] 28 69 3636 25 25 89
[6,] 4 2 3 7 25 668
[7,] 2 3 188 1 0 2
[8,] 51 8 23 301 296 1619
[9,] 12 10 652 14 13 11
[10,] 4 5 3 540 6 10
> #TAGWISE DISPERSION APPROACH
> fprior <- estimateSmoothing(df)
> fprior
[1] 6329.643
#I also tried prior.n=25 and prior.n=50, but results not shown.
> df<-estimateTagwiseDisp(df, prior.n = 0)
> quantile(df$tagwise.dispersion)
0% 25% 50% 75% 100%
1.001001e-03 2.151338e-02 7.667087e-02 1.715599e-01 9.990000e+02
> df.tgw<-exactTest(df,common.disp=FALSE)
> TGWtop10<-topTags(df.tgw)$table
> TGWtop10[,-1]
logConc logFC PValue FDR
2659 -13.14601 -0.7454279 2.722274e-25 5.084391e-21
11865 -14.21354 -1.4925866 5.769090e-22 5.387465e-18
13066 -16.14612 -1.3689788 2.176835e-15 1.355225e-11
12381 -15.12798 -0.9772265 5.676831e-15 2.650654e-11
17206 -17.37537 -2.0234616 1.808245e-14 5.722016e-11
1172 -13.15737 -0.5535657 1.838202e-14 5.722016e-11
8678 -15.78098 -1.4312623 5.231726e-13 1.395899e-09
251 -15.03996 -0.8806583 1.137238e-12 2.655024e-09
8466 -14.50024 -0.7195308 1.861731e-12 3.863507e-09
8472 -15.35444 -0.9235374 5.519857e-12 1.030944e-08
> TGWtopIDs<-as.numeric(row.names(TGWtop10))
> df$counts[TGWtopIDs,]
C1 C2 C3 T1 T2 T3
[1,] 685 346 337 340 340 313
[2,] 382 199 256 121 103 142
[3,] 97 59 55 29 36 35
[4,] 171 91 111 87 73 72
[5,] 54 24 35 12 8 14
[6,] 629 295 345 354 352 347
[7,] 138 58 84 41 47 38
[8,] 200 89 96 93 75 87
[9,] 231 138 157 149 115 128
[10,] 145 87 81 74 75 53
> df$tagwise.dispersion[TGWtopIDs]
[1] 0.001001001 0.011153172 0.001001001 0.001001001 0.001001001
[6] 0.001001001 0.011153172 0.001001001 0.001001001 0.001001001
> df$tagwise.dispersion[CDtopIDs]
[1] 3.1369561 3.0528706 2.6123364 2.4261316 2.4261316 1.8815105
[7] 3.2246046 0.6054731 2.0582920 2.1059294
More information about the Bioconductor
mailing list