[BioC] edgeR - R script - results compared to DESeq
Smith, Hilary A
hilary.smith at gatech.edu
Wed Nov 30 14:30:36 CET 2011
In case it helps the discussion ... I also tried running GLMs in both DESeq and edgeR. I likewise found that edgeR yielded more differentially expressed tags or genes. I know Dr. Gordon Smyth mentioned calcNormFactors and tagwise dispersion; I did use both of these options.
If it helps, an abstract description of the model comparison used in both programs is below (and below that if helpful, full code for edgeR -- I am using the newest release). I assume the differences are coming from the ways DESeq and edgeR estimate dispersion, but I'm eager to learn more about the rationale (especially given I just started using R/Bioconductor a few weeks ago) and note the results below in case they are of use in identifying the differences. I am glad to hear this is not just a factor of my dataset and is a common feature to have edgeR find more genes.
My models for main effects A and B (with 3 biological reps. each) and their interaction are:
Full: (A + B + A:B)
Reduced1: (A + B)
Reduced 2: (A)
The comparison of the Full vs. Reduced1 yields the genes impacted by the interaction term A:B. To obtain genes impacted by the main effect A, I perform a comparison of Full vs. Reduced1 and a comparison of Full vs. Reduced 2 -- those genes found at P.adj<0.05 in the Full vs. Reduced 2 comparison but not found in Full vs. Reduced 1 are what I am noting as genes impacted by main effect A. (To the best of my knowledge I cannot simply drop term A and leave in the interaction term, so this is my attempt to isolate term A. If there's a better way to do this, I'd be glad to know.).
My results for the interaction term are:
edgeR: 173 genes
DESeq: 38 genes
For the main effect A:
edgeR: 261 genes
DESeq: 61 genes
**NOTE: For this comparison of term A, of the 61 genes found by DESeq, about 44 (or ~72%) were also found by edgeR.
I did have warnings in running DESeq that the Full model GLM didn't converge which is disconcerting... edgeR didn't give these warnings but still found more components.
Best,
Hilary
~~In the code below, main effect "A" above is "Season," and main effect "B" above is "Hydroperiod."
> library(edgeR)
> library(limma)
> raw.data = read.csv("2011.11.14counts.csv", header=TRUE, stringsAsFactors=FALSE)
> d = raw.data[,2:13]
> rownames(d) = raw.data[,1]
> head(d)
X1E_R X2E_R X3E_R X1P_R X2P_R X3P_R X1E_F X2E_F X3E_F X1P_F X2P_F
comp0 1159 1572 1817 605 1113 1732 1065 1207 1477 1841 1915
comp1 534 675 739 236 451 799 544 341 333 690 502
comp10 37677 54466 58271 34712 40312 51243 30423 28044 23961 53852 59300
comp100 1065 1332 1620 658 861 1370 1060 999 918 1697 1117
comp1000 157 266 247 135 188 244 130 229 141 263 182
comp10000 14 37 47 17 21 64 35 15 10 28 22
X3P_F
comp0 1645
comp1 571
comp10 44575
comp100 1336
comp1000 168
comp10000 12
> Hydroperiod = factor(c("E", "E", "E", "P", "P", "P", "E", "E", "E", "P", "P", "P"))
> Season = factor(c("R", "R", "R", "R", "R", "R", "F", "F", "F", "F", "F", "F"))
> design = model.matrix(~Hydroperiod + Season + Hydroperiod:Season)
> design
(Intercept) HydroperiodP SeasonR HydroperiodP:SeasonR
1 1 0 1 0
2 1 0 1 0
3 1 0 1 0
4 1 1 1 1
5 1 1 1 1
6 1 1 1 1
7 1 0 0 0
8 1 0 0 0
9 1 0 0 0
10 1 1 0 0
11 1 1 0 0
12 1 1 0 0
attr(,"assign")
[1] 0 1 2 3
attr(,"contrasts")
attr(,"contrasts")$Hydroperiod
[1] "contr.treatment"
attr(,"contrasts")$Season
[1] "contr.treatment"
> d.GLM = DGEList(d, group = c("ER", "ER", "ER", "PR", "PR", "PR", "EF", "EF", "EF", "PF", "PF", "PF"))
Calculating library sizes from column totals.
> d.GLM = calcNormFactors(d.GLM)
> d.GLM
An object of class "DGEList"
$samples
group lib.size norm.factors
X1E_R ER 23295633 0.9559226
X2E_R ER 25882545 1.1040337
X3E_R ER 29401480 1.0236513
X1P_R PR 20877015 0.8199915
X2P_R PR 26649613 0.8869479
7 more rows ...
$counts
X1E_R X2E_R X3E_R X1P_R X2P_R X3P_R X1E_F X2E_F X3E_F X1P_F X2P_F
comp0 1159 1572 1817 605 1113 1732 1065 1207 1477 1841 1915
comp1 534 675 739 236 451 799 544 341 333 690 502
comp10 37677 54466 58271 34712 40312 51243 30423 28044 23961 53852 59300
comp100 1065 1332 1620 658 861 1370 1060 999 918 1697 1117
comp1000 157 266 247 135 188 244 130 229 141 263 182
X3P_F
comp0 1645
comp1 571
comp10 44575
comp100 1336
comp1000 168
25055 more rows ...
$all.zeros
comp0 comp1 comp10 comp100 comp1000
FALSE FALSE FALSE FALSE FALSE
25055 more elements ...
> nrow(d.GLM)
[1] 25060
> dim(d.GLM)
[1] 25060 12
> design
(Intercept) HydroperiodP SeasonR HydroperiodP:SeasonR
1 1 0 1 0
2 1 0 1 0
3 1 0 1 0
4 1 1 1 1
5 1 1 1 1
6 1 1 1 1
7 1 0 0 0
8 1 0 0 0
9 1 0 0 0
10 1 1 0 0
11 1 1 0 0
12 1 1 0 0
attr(,"assign")
[1] 0 1 2 3
attr(,"contrasts")
attr(,"contrasts")$Hydroperiod
[1] "contr.treatment"
attr(,"contrasts")$Season
[1] "contr.treatment"
> d.GLM = estimateGLMCommonDisp(d.GLM, design)
> names(d.GLM)
[1] "samples" "counts" "all.zeros"
[4] "common.dispersion"
> d.GLM$common.dispersion
[1] 0.1488192
> sqrt(d.GLM$common.dispersion)
[1] 0.3857709
> # 0.3857709 is the Coefficient of Biological Variation
> d.GLM = estimateGLMTrendedDisp(d.GLM, design)
Loading required package: splines
> summary(d.GLM$trended.dispersion)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.07541 0.10240 0.19030 0.25500 0.37530 1.26900
> d.GLM = estimateGLMTagwiseDisp(d.GLM, design)
> d.GLM$prior.n
NULL
> d$prior.n
NULL
> ls()
[1] "Hydroperiod" "Season" "d" "d.GLM" "design"
[6] "raw.data"
> d.GLM
An object of class "DGEList"
$samples
group lib.size norm.factors
X1E_R ER 23295633 0.9559226
X2E_R ER 25882545 1.1040337
X3E_R ER 29401480 1.0236513
X1P_R PR 20877015 0.8199915
X2P_R PR 26649613 0.8869479
7 more rows ...
$counts
X1E_R X2E_R X3E_R X1P_R X2P_R X3P_R X1E_F X2E_F X3E_F X1P_F X2P_F
comp0 1159 1572 1817 605 1113 1732 1065 1207 1477 1841 1915
comp1 534 675 739 236 451 799 544 341 333 690 502
comp10 37677 54466 58271 34712 40312 51243 30423 28044 23961 53852 59300
comp100 1065 1332 1620 658 861 1370 1060 999 918 1697 1117
comp1000 157 266 247 135 188 244 130 229 141 263 182
X3P_F
comp0 1645
comp1 571
comp10 44575
comp100 1336
comp1000 168
25055 more rows ...
$all.zeros
comp0 comp1 comp10 comp100 comp1000
FALSE FALSE FALSE FALSE FALSE
25055 more elements ...
$common.dispersion
[1] 0.1488192
$trended.dispersion
[1] 0.07619372 0.08679353 0.10444478 0.07739597 0.10629516
25055 more elements ...
$abundance
comp0 comp1 comp10 comp100 comp1000
-9.737514 -10.720757 -6.331670 -9.937984 -11.724981
25055 more elements ...
$bin.dispersion
[1] 0.7340070 0.6589801 0.5901124 0.5500868 0.4865594
22 more elements ...
$bin.abundance
[1] -17.08166 -16.46619 -16.13033 -15.84028 -15.59074
22 more elements ...
$tagwise.dispersion
[1] 0.06348981 0.06769437 0.07385569 0.05410498 0.08450404
25055 more elements ...
> ?getPriorN
> getPriorN(d.GLM, design=design)
[1] 2.5
> head(d.GLM$tagwise.dispersion)
[1] 0.06348981 0.06769437 0.07385569 0.05410498 0.08450404 0.19604508
> summary(d.GLM$tagwise.dispersion)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.05139 0.09363 0.18700 0.25910 0.35480 1.71800
> glmfit.tgw = glmFit(d.GLM, design, dispersion=d.GLM$tagwise.dispersion)
> lrt.tgw = glmLRT(d.GLM, glmfit.tgw)
> topTags(lrt.tgw)
Coefficient: HydroperiodP:SeasonR
logConc logFC LR P.Value FDR
comp13665 -10.932729 1.068620e+01 59.37370 1.304000e-14 3.267824e-10
comp15478 -12.077538 7.742379e+00 47.67249 5.037112e-12 5.552272e-08
comp370 -13.588446 1.442695e+08 46.86820 7.592458e-12 5.552272e-08
comp10848 -11.836655 8.970575e+00 46.56512 8.862366e-12 5.552272e-08
comp13403 -9.315242 3.773345e+00 44.92234 2.050057e-11 1.027488e-07
comp7180 -11.518762 7.869625e+00 43.18096 4.990399e-11 2.084323e-07
comp2502 -10.479763 5.723705e+00 41.56745 1.138735e-10 4.076673e-07
comp2740 -10.814075 4.666231e+00 38.00241 7.065735e-10 2.213342e-06
comp4314 -13.104853 4.837400e+00 35.44570 2.622602e-09 7.302491e-06
comp13675 -12.930253 4.442508e+00 34.46100 4.348772e-09 1.089802e-05
> summary(decideTestsDGE(lrt.tgw))
[,1]
-1 57
0 24887
1 116
> design
(Intercept) HydroperiodP SeasonR HydroperiodP:SeasonR
1 1 0 1 0
2 1 0 1 0
3 1 0 1 0
4 1 1 1 1
5 1 1 1 1
6 1 1 1 1
7 1 0 0 0
8 1 0 0 0
9 1 0 0 0
10 1 1 0 0
11 1 1 0 0
12 1 1 0 0
attr(,"assign")
[1] 0 1 2 3
attr(,"contrasts")
attr(,"contrasts")$Hydroperiod
[1] "contr.treatment"
attr(,"contrasts")$Season
[1] "contr.treatment"
> lrt.coef4 = glmLRT(d.GLM, glmfit.tgw, coef=4)
> topTags(lrt.coef4)
Coefficient: HydroperiodP:SeasonR
logConc logFC LR P.Value FDR
comp13665 -10.932729 1.068620e+01 59.37370 1.304000e-14 3.267824e-10
comp15478 -12.077538 7.742379e+00 47.67249 5.037112e-12 5.552272e-08
comp370 -13.588446 1.442695e+08 46.86820 7.592458e-12 5.552272e-08
comp10848 -11.836655 8.970575e+00 46.56512 8.862366e-12 5.552272e-08
comp13403 -9.315242 3.773345e+00 44.92234 2.050057e-11 1.027488e-07
comp7180 -11.518762 7.869625e+00 43.18096 4.990399e-11 2.084323e-07
comp2502 -10.479763 5.723705e+00 41.56745 1.138735e-10 4.076673e-07
comp2740 -10.814075 4.666231e+00 38.00241 7.065735e-10 2.213342e-06
comp4314 -13.104853 4.837400e+00 35.44570 2.622602e-09 7.302491e-06
comp13675 -12.930253 4.442508e+00 34.46100 4.348772e-09 1.089802e-05
> lrt.coef34 = glmLRT(d.GLM, glmfit.tgw, coef=3:4)
> topTags(lrt.coef34)
Coefficient: SeasonR HydroperiodP:SeasonR
logConc SeasonR HydroperiodP.SeasonR LR P.Value
comp11779 -13.60929 1.442695e+08 -1.442695e+08 88.20296 7.030231e-20
comp21414 -13.64545 5.616807e+00 1.442695e+08 71.78587 2.581642e-16
comp6411 -10.57883 1.671168e+00 2.006498e+00 67.75411 1.938124e-15
comp6417 -10.10518 1.510699e+00 1.224445e+00 65.09872 7.311274e-15
comp13665 -10.93273 -2.193560e+00 1.068620e+01 62.70305 2.422182e-14
comp1872 -12.53141 3.893662e+00 -2.007081e+00 62.49057 2.693670e-14
comp5005 -11.13142 4.565629e-01 3.063432e+00 61.87012 3.673456e-14
comp15150 -13.28156 5.032869e+00 1.442695e+08 58.96222 1.572234e-13
comp2502 -10.47976 -4.007870e-01 5.723705e+00 56.89575 4.418199e-13
comp19402 -11.60722 5.375493e+00 -5.180038e+00 52.82636 3.379876e-12
FDR
comp11779 1.761776e-15
comp21414 3.234797e-12
comp6411 1.618980e-11
comp6417 4.580513e-11
comp13665 1.125056e-10
comp1872 1.125056e-10
comp5005 1.315097e-10
comp15150 4.925022e-10
comp2502 1.230223e-09
comp19402 8.469969e-09
> ls()
[1] "Hydroperiod" "Season" "d" "d.GLM" "design"
[6] "glmfit.tgw" "lrt.coef34" "lrt.coef4" "lrt.tgw" "oo"
[11] "raw.data"
> head(lrt.coef34)
An object of class "DGELRT"
$samples
group lib.size norm.factors
X1E_R ER 23295633 0.9559226
X2E_R ER 25882545 1.1040337
X3E_R ER 29401480 1.0236513
X1P_R PR 20877015 0.8199915
X2P_R PR 26649613 0.8869479
7 more rows ...
$all.zeros
comp0 comp1 comp10 comp100 comp1000 comp10000
FALSE FALSE FALSE FALSE FALSE FALSE
$common.dispersion
[1] 0.1488192
$trended.dispersion
[1] 0.07619372 0.08679353 0.10444478 0.07739597 0.10629516 0.20365547
$abundance
comp0 comp1 comp10 comp100 comp1000 comp10000
-9.737514 -10.720757 -6.331670 -9.937984 -11.724981 -13.712600
$bin.dispersion
[1] 0.7340070 0.6589801 0.5901124 0.5500868 0.4865594
22 more elements ...
$bin.abundance
[1] -17.08166 -16.46619 -16.13033 -15.84028 -15.59074
22 more elements ...
$tagwise.dispersion
[1] 0.06348981 0.06769437 0.07385569 0.05410498 0.08450404 0.19604508
$coef
[1] 4
$table
logConc logFC.SeasonR logFC.HydroperiodP.SeasonR LR.statistic
comp0 -9.736436 -0.24995924 -0.3214735 4.34114504
comp1 -10.739594 0.20037356 -0.3870300 0.77512680
comp10 -6.341516 0.37480089 -0.5276106 1.59407277
comp100 -9.940774 -0.06226595 -0.3360123 2.12228134
comp1000 -11.726085 -0.07140854 0.1131265 0.05488506
comp10000 -13.750173 0.21120837 0.5708073 1.99318234
p.value
comp0 0.1141123
comp1 0.6787086
comp10 0.4506626
comp100 0.3460608
comp1000 0.9729306
comp10000 0.3691356
$coefficients.full
(Intercept) HydroperiodP SeasonR HydroperiodP:SeasonR
comp0 -9.620221 0.028007573 -0.17325854 -0.22282846
comp1 -10.774172 0.058157626 0.13888837 -0.26826873
comp10 -6.555230 0.335378111 0.25979218 -0.36571177
comp100 -9.871905 0.009775718 -0.04315947 -0.23290600
comp1000 -11.662643 -0.118031564 -0.04949663 0.07841331
comp10000 -13.805669 -0.272566725 0.14639848 0.39565348
$coefficients.null
(Intercept) HydroperiodP
comp0 -9.703284 -0.06737999
comp1 -10.701998 -0.07653693
comp10 -6.416913 0.14550436
comp100 -9.893311 -0.09719967
comp1000 -11.687332 -0.07884619
comp10000 -13.727706 -0.04514128
$design.full
(Intercept) HydroperiodP SeasonR HydroperiodP:SeasonR
1 1 0 1 0
2 1 0 1 0
3 1 0 1 0
4 1 1 1 1
5 1 1 1 1
7 more rows ...
$design.null
(Intercept) HydroperiodP
1 1 0
2 1 0
3 1 0
4 1 1
5 1 1
7 more rows ...
$dispersion.used
[1] 0.06348981 0.06769437 0.07385569 0.05410498 0.08450404 0.19604508
$comparison
[1] "SeasonR" "HydroperiodP:SeasonR"
> > RemoveSeasonRmHydBySeason = topTags(lrt.coef34, number =25060)
Error in topTags(lrt.coef34, number = 25060) :
unused argument(s) (number = 25060)
> ?topTags
> RemoveSeasonRmHydBySeason = topTags(lrt.coef34, n =25060)
> ls()
[1] "Hydroperiod" "RemoveSeasonRmHydBySeason"
[3] "Season" "d"
[5] "d.GLM" "design"
[7] "glmfit.tgw" "lrt.coef34"
[9] "lrt.coef4" "lrt.tgw"
[11] "oo" "raw.data"
> RemoveHydBySeason = topTags(lrt.coef4, n=25060)
> write.csv(RemoveHydBySeason, "RemoveHydBySeason.csv")
> write.csv(RemoveSeasonRmHydBySeason, "RemoveSeasonRmHydBySeason.csv")
> summary(decideTestsDGE(lrt.coef4)
+
> summary(decideTestsDGE(lrt.coef4))
[,1]
-1 57
0 24887
1 116
> summary(decideTestsDGE(lrt.coef34))
Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x), :
attempt to set an attribute on NULL
> design
(Intercept) HydroperiodP SeasonR HydroperiodP:SeasonR
1 1 0 1 0
2 1 0 1 0
3 1 0 1 0
4 1 1 1 1
5 1 1 1 1
6 1 1 1 1
7 1 0 0 0
8 1 0 0 0
9 1 0 0 0
10 1 1 0 0
11 1 1 0 0
12 1 1 0 0
attr(,"assign")
[1] 0 1 2 3
attr(,"contrasts")
attr(,"contrasts")$Hydroperiod
[1] "contr.treatment"
attr(,"contrasts")$Season
[1] "contr.treatment"
More information about the Bioconductor
mailing list