[BioC] edgeR
Jahn Davik [guest]
guest at bioconductor.org
Wed Jan 29 08:40:04 CET 2014
Hi,
I am using edgeR to analyse counts data from an experiment consisting of two genotypes sampled at five time points. Three biological replicates. I have used the 'Classic approach' and the 'GLM approach', and expected to get the same results either way. At least, I expected to find the same topTags irrespectively of approach. (But maybe that is wrong?)
Here is what I've done:
CLASSIC APPROACH:
> # Classic part
>
> y<-DGEList(counts=x,group=Group)
> y$samples
group lib.size norm.factors
E00R1 E00 57811138 1
E00R2 E00 51440945 1
E00R3 E00 48826680 1
E01R1 E01 45632509 1
E01R2 E01 58231864 1
E01R3 E01 52760716 1
E05R1 E05 56470353 1
E05R2 E05 49744068 1
E05R3 E05 46002201 1
E48R2 E48 40331851 1
E48R3 E48 52134771 1
E96R1 E96 49449978 1
E96R2 E96 44233768 1
E96R3 E96 57043515 1
J00R1 J00 58919765 1
J00R2 J00 40388557 1
J00R3 J00 54126457 1
J01R1 J01 57302722 1
J01R2 J01 56813843 1
J01R3 J01 46711117 1
J05R1 J05 64733652 1
J05R2 J05 48922602 1
J05R3 J05 28696513 1
J48R2 J48 49544429 1
J48R3 J48 42298884 1
J96R1 J96 51538248 1
J96R2 J96 66995708 1
J96R3 J96 38325422 1
>
> levels(y$samples$group)
[1] "E00" "E01" "E05" "E48" "E96" "J00" "J01" "J05" "J48" "J96"
> y<-calcNormFactors(y)
> # estimating dispersion
> # over all genes
> y<-estimateCommonDisp(y)
>
> # genewise
>
> y<-estimateTagwiseDisp(y)
>
> plotBCV(y)
>
> et<-exactTest(y, pair=c("E00","J00"))
> topTags(et)
Comparison of groups: J00-E00
logFC logCPM PValue FDR
comp394477_c0 -15.422419 5.602823 0.000000e+00 0.000000e+00
comp413538_c1 -11.685657 4.694669 0.000000e+00 0.000000e+00
comp431566_c0 -11.615584 4.405613 0.000000e+00 0.000000e+00
comp404433_c0 8.790634 4.710831 0.000000e+00 0.000000e+00
comp412635_c0 6.325601 4.519383 0.000000e+00 0.000000e+00
comp431005_c0 4.235474 6.494423 0.000000e+00 0.000000e+00
comp373625_c1 6.893877 3.728574 3.134846e-288 9.760118e-285
comp421285_c2 -4.744062 5.016100 6.226410e-278 1.696230e-274
comp419056_c0 13.453985 2.801532 6.846340e-239 1.657879e-235
comp401947_c0 7.987398 2.858410 1.078798e-217 2.351132e-214
>
> et<-exactTest(y, pair=c("E01","J01"))
> topTags(et)
Comparison of groups: J01-E01
logFC logCPM PValue FDR
comp413538_c1 -14.570734 4.694669 0.000000e+00 0.000000e+00
comp431566_c0 -14.185398 4.405613 0.000000e+00 0.000000e+00
comp394477_c0 -14.137519 5.602823 0.000000e+00 0.000000e+00
comp404433_c0 10.181217 4.710831 0.000000e+00 0.000000e+00
comp373625_c1 9.738900 3.728574 0.000000e+00 0.000000e+00
comp412635_c0 7.077064 4.519383 0.000000e+00 0.000000e+00
comp431005_c0 4.393042 6.494423 0.000000e+00 0.000000e+00
comp421285_c2 -4.656943 5.016100 2.038437e-267 5.553212e-264
comp404433_c1 13.310478 3.132245 1.777238e-232 4.303681e-229
comp419056_c0 11.369324 2.801532 2.968126e-222 6.468734e-219
>
> et<-exactTest(y, pair=c("E05","J05"))
> topTags(et)
Comparison of groups: J05-E05
logFC logCPM PValue FDR
comp404433_c0 12.392403 4.710831 0.000000e+00 0.000000e+00
comp394477_c0 -11.251416 5.602823 0.000000e+00 0.000000e+00
comp413538_c1 -11.075882 4.694669 0.000000e+00 0.000000e+00
comp412635_c0 7.071984 4.519383 0.000000e+00 0.000000e+00
comp431005_c0 3.866607 6.494423 5.618431e-314 2.448962e-310
comp373625_c1 9.542258 3.728574 4.235325e-310 1.538411e-306
comp431566_c0 -11.206496 4.405613 1.889575e-289 5.883056e-286
comp421285_c2 -4.448176 5.016100 4.956948e-245 1.350396e-241
comp379820_c0 -6.741883 4.283112 4.759103e-221 1.152443e-217
comp401947_c0 12.325326 2.858410 2.865886e-201 6.245911e-198
>
> et<-exactTest(y, pair=c("E48","J48"))
> topTags(et)
Comparison of groups: J48-E48
logFC logCPM PValue FDR
comp413538_c1 -11.577416 4.694669 9.806810e-268 2.137296e-263
comp412635_c0 7.534573 4.519383 4.519394e-256 4.924784e-252
comp373625_c1 11.099259 3.728574 4.211513e-231 3.059524e-227
comp421285_c2 -5.694947 5.016100 8.898521e-225 4.848359e-221
comp369941_c0 19.008105 9.936255 1.581987e-216 6.895566e-213
comp431566_c0 -13.422262 4.405613 1.567587e-181 5.693999e-178
comp404433_c0 12.908316 4.710831 2.647072e-159 8.241469e-156
comp394477_c0 -11.329087 5.602823 8.649066e-159 2.356222e-155
comp423647_c1 -3.235352 5.607951 1.216586e-156 2.946030e-153
comp379820_c0 -6.545352 4.283112 1.001698e-152 2.183100e-149
>
> et<-exactTest(y, pair=c("E96","J96"))
> topTags(et)
Comparison of groups: J96-E96
logFC logCPM PValue FDR
comp413538_c1 -14.558952 4.694669 0.000000e+00 0.000000e+00
comp431566_c0 -14.309061 4.405613 0.000000e+00 0.000000e+00
comp373625_c1 10.996613 3.728574 0.000000e+00 0.000000e+00
comp412635_c0 6.482673 4.519383 0.000000e+00 0.000000e+00
comp421285_c2 -6.022731 5.016100 0.000000e+00 0.000000e+00
comp404433_c0 12.027340 4.710831 3.172936e-312 1.152516e-308
comp369941_c0 17.982161 9.936255 2.109743e-292 6.568535e-289
comp423647_c1 -3.434699 5.607951 7.129247e-256 1.942185e-252
comp431005_c0 3.384003 6.494423 7.465632e-247 1.807844e-243
comp394477_c0 -13.722197 5.602823 9.993279e-244 2.177935e-240
>
MY INTENTION SO FAR HAS BEEN TO COMPARE THE GENOTYPES, E AND J, AT THE DIFFERENT TIME POINTS.
SO FAR, SO GOOD (?)
MOVING OVER TO THE GLM APPROACH:
> z<-x
> z<-DGEList(counts=x,group=Group)
> z$samples
group lib.size norm.factors
E00R1 E 57811138 1
E00R2 E 51440945 1
E00R3 E 48826680 1
E01R1 E 45632509 1
E01R2 E 58231864 1
E01R3 E 52760716 1
E05R1 E 56470353 1
E05R2 E 49744068 1
E05R3 E 46002201 1
E48R2 E 40331851 1
E48R3 E 52134771 1
E96R1 E 49449978 1
E96R2 E 44233768 1
E96R3 E 57043515 1
J00R1 J 58919765 1
J00R2 J 40388557 1
J00R3 J 54126457 1
J01R1 J 57302722 1
J01R2 J 56813843 1
J01R3 J 46711117 1
J05R1 J 64733652 1
J05R2 J 48922602 1
J05R3 J 28696513 1
J48R2 J 49544429 1
J48R3 J 42298884 1
J96R1 J 51538248 1
J96R2 J 66995708 1
J96R3 J 38325422 1
>
> z<-calcNormFactors(z)
> z<-x
> z<-DGEList(counts=z,group=Group)
> z$samples
group lib.size norm.factors
E00R1 E 57811138 1
E00R2 E 51440945 1
E00R3 E 48826680 1
E01R1 E 45632509 1
E01R2 E 58231864 1
E01R3 E 52760716 1
E05R1 E 56470353 1
E05R2 E 49744068 1
E05R3 E 46002201 1
E48R2 E 40331851 1
E48R3 E 52134771 1
E96R1 E 49449978 1
E96R2 E 44233768 1
E96R3 E 57043515 1
J00R1 J 58919765 1
J00R2 J 40388557 1
J00R3 J 54126457 1
J01R1 J 57302722 1
J01R2 J 56813843 1
J01R3 J 46711117 1
J05R1 J 64733652 1
J05R2 J 48922602 1
J05R3 J 28696513 1
J48R2 J 49544429 1
J48R3 J 42298884 1
J96R1 J 51538248 1
J96R2 J 66995708 1
J96R3 J 38325422 1
>
> z<-calcNormFactors(z)
> Group<-factor(substring(colnames(z),1,1))
> Group
[1] E E E E E E E E E E E E E E J J J J J J J J J J J J J J
Levels: E J
>
> Time<-factor(substring(colnames(z),2,3))
> Time
[1] 00 00 00 01 01 01 05 05 05 48 48 96 96 96 00 00 00 01 01 01 05 05 05 48 48 96 96 96
Levels: 00 01 05 48 96
>
> design<-model.matrix(~Group+Group:Time)
> rownames(design)<-colnames(z)
> design
(Intercept) GroupJ GroupE:Time01 GroupJ:Time01 GroupE:Time05 GroupJ:Time05 GroupE:Time48 GroupJ:Time48 GroupE:Time96 GroupJ:Time96
E00R1 1 0 0 0 0 0 0 0 0 0
E00R2 1 0 0 0 0 0 0 0 0 0
E00R3 1 0 0 0 0 0 0 0 0 0
E01R1 1 0 1 0 0 0 0 0 0 0
E01R2 1 0 1 0 0 0 0 0 0 0
E01R3 1 0 1 0 0 0 0 0 0 0
E05R1 1 0 0 0 1 0 0 0 0 0
E05R2 1 0 0 0 1 0 0 0 0 0
E05R3 1 0 0 0 1 0 0 0 0 0
E48R2 1 0 0 0 0 0 1 0 0 0
E48R3 1 0 0 0 0 0 1 0 0 0
E96R1 1 0 0 0 0 0 0 0 1 0
E96R2 1 0 0 0 0 0 0 0 1 0
E96R3 1 0 0 0 0 0 0 0 1 0
J00R1 1 1 0 0 0 0 0 0 0 0
J00R2 1 1 0 0 0 0 0 0 0 0
J00R3 1 1 0 0 0 0 0 0 0 0
J01R1 1 1 0 1 0 0 0 0 0 0
J01R2 1 1 0 1 0 0 0 0 0 0
J01R3 1 1 0 1 0 0 0 0 0 0
J05R1 1 1 0 0 0 1 0 0 0 0
J05R2 1 1 0 0 0 1 0 0 0 0
J05R3 1 1 0 0 0 1 0 0 0 0
J48R2 1 1 0 0 0 0 0 1 0 0
J48R3 1 1 0 0 0 0 0 1 0 0
J96R1 1 1 0 0 0 0 0 0 0 1
J96R2 1 1 0 0 0 0 0 0 0 1
J96R3 1 1 0 0 0 0 0 0 0 1
attr(,"assign")
[1] 0 1 2 2 2 2 2 2 2 2
attr(,"contrasts")
attr(,"contrasts")$Group
[1] "contr.treatment"
attr(,"contrasts")$Time
[1] "contr.treatment"
> z<-estimateGLMCommonDisp(z,design)
>
> # genewise
> z<-estimateGLMTrendedDisp(z,design)
> z<-estimateGLMTagwiseDisp(z,design)
>
> fit<-glmFit(z,design)
> lrt<-glmLRT(fit,coef=2)
> topTags(lrt)
Coefficient: GroupJ
logFC logCPM LR PValue FDR
comp394477_c0 -15.320108 5.602823 1599.742 0.000000e+00 0.000000e+00
comp413538_c1 -11.668893 4.694669 2063.236 0.000000e+00 0.000000e+00
comp431566_c0 -11.598820 4.405613 1505.514 0.000000e+00 0.000000e+00
comp404433_c0 8.789010 4.710831 1614.360 0.000000e+00 0.000000e+00
comp412635_c0 6.325157 4.519383 1638.631 0.000000e+00 0.000000e+00
comp431005_c0 4.235436 6.494423 1572.862 0.000000e+00 0.000000e+00
comp421285_c2 -4.743959 5.016100 1273.102 7.917312e-279 2.464999e-275
comp373625_c1 6.892714 3.728574 1255.435 5.468549e-275 1.489769e-271
comp419056_c0 13.351686 2.801532 1103.131 6.893184e-242 1.669223e-238
comp401947_c0 7.983599 2.858410 1005.328 1.247665e-220 2.719160e-217
>
> # testing the Group DE at 01h
> lrt<-glmLRT(fit,contrast=c(0,0,-1,1,0,0,0,0,0,0))
> topTags(lrt)
Coefficient: -1*GroupE:Time01 1*GroupJ:Time01
logFC logCPM LR PValue FDR
comp401050_c0 3.493540 4.889617 169.62289 8.943982e-39 1.949251e-34
comp427887_c0 2.494577 6.359053 151.68090 7.439934e-35 8.107296e-31
comp429669_c1 3.461429 7.202470 139.75242 3.015465e-32 2.190635e-28
comp425652_c1 -3.747270 5.578305 121.36647 3.176789e-28 1.467673e-24
comp431260_c0 -1.510660 6.227693 121.25102 3.367149e-28 1.467673e-24
comp428540_c0 2.458811 4.676311 95.33553 1.607026e-22 5.837254e-19
comp437641_c0 1.729965 5.735428 89.12381 3.708599e-21 1.010451e-17
comp415376_c0 -3.768543 1.566064 89.12355 3.709096e-21 1.010451e-17
comp419207_c0 -2.096465 7.000344 88.31376 5.585358e-21 1.352525e-17
comp428866_c0 1.503459 4.857282 85.41288 2.421386e-20 5.277168e-17
>
> # testing the Group DE at 05h
> lrt<-glmLRT(fit,contrast=c(0,0,0,0,-1,1,0,0,0,0))
> topTags(lrt)
Coefficient: -1*GroupE:Time05 1*GroupJ:Time05
logFC logCPM LR PValue FDR
comp436562_c0 3.559768 8.677505 245.0805 3.068810e-55 6.688165e-51
comp419735_c0 7.941315 6.207378 200.3481 1.753344e-45 1.910619e-41
comp401050_c0 3.464120 4.889617 166.9922 3.358441e-38 2.439795e-34
comp427887_c0 2.392173 6.359053 139.7174 3.069165e-32 1.672235e-28
comp429669_c1 3.377087 7.202470 133.3528 7.569234e-31 2.972487e-27
comp415376_c0 -4.542037 1.566064 133.1979 8.183411e-31 2.972487e-27
comp425652_c1 -3.682664 5.578305 117.3181 2.445272e-27 7.613181e-24
comp430561_c0 -2.003697 5.119557 108.4644 2.126367e-25 5.531729e-22
comp433200_c0 2.289307 4.954039 108.3223 2.284370e-25 5.531729e-22
comp429162_c0 1.995171 3.402911 107.9965 2.692502e-25 5.868040e-22
>
> # testing the Group DE at 48h
> lrt<-glmLRT(fit,contrast=c(0,0,0,0,0,0,-1,1,0,0))
> topTags(lrt)
Coefficient: -1*GroupE:Time48 1*GroupJ:Time48
logFC logCPM LR PValue FDR
comp369941_c0 18.541426 9.936255 823.6359 3.919895e-181 8.543020e-177
comp417702_c0 -3.622411 3.990389 207.6927 4.377967e-47 4.770670e-43
comp417690_c0 -3.347656 5.239097 176.9797 2.212611e-40 1.607388e-36
comp436283_c0 -3.851908 1.834438 164.1105 1.430948e-37 7.796518e-34
comp417686_c0 -3.156963 3.568423 141.8250 1.062032e-32 4.629184e-29
comp417690_c1 -3.248463 4.347931 129.4769 5.333285e-30 1.937227e-26
comp423176_c0 -2.346243 5.066883 125.6650 3.640327e-29 1.133390e-25
comp431260_c0 -1.695085 6.227693 121.4132 3.102817e-28 8.452849e-25
comp419207_c0 -2.760753 7.000344 119.4983 8.146313e-28 1.972675e-24
comp435553_c0 1.785259 5.622214 110.4265 7.902139e-26 1.722192e-22
>
> # testing the Group DE at 96h
> lrt<-glmLRT(fit,contrast=c(0,0,0,0,0,0,0,0,-1,1))
> topTags(lrt)
Coefficient: -1*GroupE:Time96 1*GroupJ:Time96
logFC logCPM LR PValue FDR
comp369941_c0 17.174042 9.936255 978.8875 6.972860e-215 1.519665e-210
comp431260_c0 -2.005239 6.227693 211.6925 5.869844e-48 6.396369e-44
comp417702_c0 -2.822779 3.990389 155.0349 1.375824e-35 9.994901e-32
comp423176_c0 -2.224759 5.066883 141.4204 1.302007e-32 7.093983e-29
comp419207_c0 -2.629442 7.000344 135.4855 2.585533e-31 1.126982e-27
comp435199_c0 -1.941443 6.089785 134.1832 4.981873e-31 1.809582e-27
comp428866_c0 1.871781 4.857282 132.4752 1.177677e-30 3.666612e-27
comp437434_c0 -1.928556 5.025404 122.7183 1.607256e-28 4.378567e-25
comp437540_c1 -1.719987 7.634490 122.0333 2.269992e-28 5.496911e-25
comp424275_c1 2.829585 2.058209 113.6028 1.592057e-26 3.469729e-23
SO, EXCEPT FOR THE FIRST TIME POINT (00) THE COMPARISONS ARE NOT SIMILAR. I WOULD VERY MUCH RECEIVE COMMENTS ON THIS AND IN PARTICULAR INDICATION(S) ON WHERE I ERR.
THANK YOU.
JAHN
-- output of sessionInfo():
> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=Norwegian (Bokmål)_Norway.1252 LC_CTYPE=Norwegian (Bokmål)_Norway.1252 LC_MONETARY=Norwegian (Bokmål)_Norway.1252 LC_NUMERIC=C
[5] LC_TIME=Norwegian (Bokmål)_Norway.1252
attached base packages:
[1] splines parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] edgeR_3.4.2 limma_3.18.6 DESeq_1.14.0 lattice_0.20-24 locfit_1.5-9.1 Biobase_2.22.0 BiocGenerics_0.8.0
loaded via a namespace (and not attached):
[1] annotate_1.40.0 AnnotationDbi_1.24.0 DBI_0.2-7 genefilter_1.44.0 geneplotter_1.40.0 grid_3.0.2 IRanges_1.20.6 lme4_1.0-5
[9] MASS_7.3-29 Matrix_1.1-1.1 minqa_1.2.2 nlme_3.1-113 RColorBrewer_1.0-5 RSQLite_0.11.4 stats4_3.0.2 survival_2.37-4
[17] tools_3.0.2 XML_3.98-1.1 xtable_1.7-1
--
Sent via the guest posting facility at bioconductor.org.
More information about the Bioconductor
mailing list