[BioC] edgeR
Gordon K Smyth
smyth at wehi.EDU.AU
Thu Jan 30 01:33:52 CET 2014
Dear Jahn,
The classic and glm approaches do give similar results, but if they were
identical then we wouldn't need to keep both in package. No two methods
will give thousands of genes in exactly the same order however.
In this case, though, it doesn't seem to me that you are testing the same
comparisons for the classic and glm approaches. All the comparisons that
you make in the classic approach are between groups at a fixed time (Group
nested within Time) but the factorial model used
design<-model.matrix(~Group+Group:Time)
has Time nested within Group. Except at time zero, it doesn't seem to me
that you are testing the same comparisons for the classic and glm
approaches.
Best wishes
Gordon
> Date: Tue, 28 Jan 2014 23:40:04 -0800 (PST)
> From: "Jahn Davik [guest]" <guest at bioconductor.org>
> To: bioconductor at r-project.org, jahn.davik at bioforsk.no
> Subject: [BioC] edgeR
>
>
> Hi,
> I am using edgeR to analyse counts data from an experiment consisting of
> two genotypes sampled at five time points. Three biological replicates.
> I have used the 'Classic approach' and the 'GLM approach', and expected
> to get the same results either way. At least, I expected to find the
> same topTags irrespectively of approach. (But maybe that is wrong?)
>
> Here is what I've done:
>
> CLASSIC APPROACH:
>> # Classic part
>>
>> y<-DGEList(counts=x,group=Group)
>> y$samples
> group lib.size norm.factors
> E00R1 E00 57811138 1
> E00R2 E00 51440945 1
> E00R3 E00 48826680 1
> E01R1 E01 45632509 1
> E01R2 E01 58231864 1
> E01R3 E01 52760716 1
> E05R1 E05 56470353 1
> E05R2 E05 49744068 1
> E05R3 E05 46002201 1
> E48R2 E48 40331851 1
> E48R3 E48 52134771 1
> E96R1 E96 49449978 1
> E96R2 E96 44233768 1
> E96R3 E96 57043515 1
> J00R1 J00 58919765 1
> J00R2 J00 40388557 1
> J00R3 J00 54126457 1
> J01R1 J01 57302722 1
> J01R2 J01 56813843 1
> J01R3 J01 46711117 1
> J05R1 J05 64733652 1
> J05R2 J05 48922602 1
> J05R3 J05 28696513 1
> J48R2 J48 49544429 1
> J48R3 J48 42298884 1
> J96R1 J96 51538248 1
> J96R2 J96 66995708 1
> J96R3 J96 38325422 1
>>
>> levels(y$samples$group)
> [1] "E00" "E01" "E05" "E48" "E96" "J00" "J01" "J05" "J48" "J96"
>> y<-calcNormFactors(y)
>
>
>> # estimating dispersion
>> # over all genes
>> y<-estimateCommonDisp(y)
>>
>> # genewise
>>
>> y<-estimateTagwiseDisp(y)
>>
>> plotBCV(y)
>>
>> et<-exactTest(y, pair=c("E00","J00"))
>> topTags(et)
> Comparison of groups: J00-E00
> logFC logCPM PValue FDR
> comp394477_c0 -15.422419 5.602823 0.000000e+00 0.000000e+00
> comp413538_c1 -11.685657 4.694669 0.000000e+00 0.000000e+00
> comp431566_c0 -11.615584 4.405613 0.000000e+00 0.000000e+00
> comp404433_c0 8.790634 4.710831 0.000000e+00 0.000000e+00
> comp412635_c0 6.325601 4.519383 0.000000e+00 0.000000e+00
> comp431005_c0 4.235474 6.494423 0.000000e+00 0.000000e+00
> comp373625_c1 6.893877 3.728574 3.134846e-288 9.760118e-285
> comp421285_c2 -4.744062 5.016100 6.226410e-278 1.696230e-274
> comp419056_c0 13.453985 2.801532 6.846340e-239 1.657879e-235
> comp401947_c0 7.987398 2.858410 1.078798e-217 2.351132e-214
>>
>> et<-exactTest(y, pair=c("E01","J01"))
>> topTags(et)
> Comparison of groups: J01-E01
> logFC logCPM PValue FDR
> comp413538_c1 -14.570734 4.694669 0.000000e+00 0.000000e+00
> comp431566_c0 -14.185398 4.405613 0.000000e+00 0.000000e+00
> comp394477_c0 -14.137519 5.602823 0.000000e+00 0.000000e+00
> comp404433_c0 10.181217 4.710831 0.000000e+00 0.000000e+00
> comp373625_c1 9.738900 3.728574 0.000000e+00 0.000000e+00
> comp412635_c0 7.077064 4.519383 0.000000e+00 0.000000e+00
> comp431005_c0 4.393042 6.494423 0.000000e+00 0.000000e+00
> comp421285_c2 -4.656943 5.016100 2.038437e-267 5.553212e-264
> comp404433_c1 13.310478 3.132245 1.777238e-232 4.303681e-229
> comp419056_c0 11.369324 2.801532 2.968126e-222 6.468734e-219
>>
>> et<-exactTest(y, pair=c("E05","J05"))
>> topTags(et)
> Comparison of groups: J05-E05
> logFC logCPM PValue FDR
> comp404433_c0 12.392403 4.710831 0.000000e+00 0.000000e+00
> comp394477_c0 -11.251416 5.602823 0.000000e+00 0.000000e+00
> comp413538_c1 -11.075882 4.694669 0.000000e+00 0.000000e+00
> comp412635_c0 7.071984 4.519383 0.000000e+00 0.000000e+00
> comp431005_c0 3.866607 6.494423 5.618431e-314 2.448962e-310
> comp373625_c1 9.542258 3.728574 4.235325e-310 1.538411e-306
> comp431566_c0 -11.206496 4.405613 1.889575e-289 5.883056e-286
> comp421285_c2 -4.448176 5.016100 4.956948e-245 1.350396e-241
> comp379820_c0 -6.741883 4.283112 4.759103e-221 1.152443e-217
> comp401947_c0 12.325326 2.858410 2.865886e-201 6.245911e-198
>>
>> et<-exactTest(y, pair=c("E48","J48"))
>> topTags(et)
> Comparison of groups: J48-E48
> logFC logCPM PValue FDR
> comp413538_c1 -11.577416 4.694669 9.806810e-268 2.137296e-263
> comp412635_c0 7.534573 4.519383 4.519394e-256 4.924784e-252
> comp373625_c1 11.099259 3.728574 4.211513e-231 3.059524e-227
> comp421285_c2 -5.694947 5.016100 8.898521e-225 4.848359e-221
> comp369941_c0 19.008105 9.936255 1.581987e-216 6.895566e-213
> comp431566_c0 -13.422262 4.405613 1.567587e-181 5.693999e-178
> comp404433_c0 12.908316 4.710831 2.647072e-159 8.241469e-156
> comp394477_c0 -11.329087 5.602823 8.649066e-159 2.356222e-155
> comp423647_c1 -3.235352 5.607951 1.216586e-156 2.946030e-153
> comp379820_c0 -6.545352 4.283112 1.001698e-152 2.183100e-149
>>
>> et<-exactTest(y, pair=c("E96","J96"))
>> topTags(et)
> Comparison of groups: J96-E96
> logFC logCPM PValue FDR
> comp413538_c1 -14.558952 4.694669 0.000000e+00 0.000000e+00
> comp431566_c0 -14.309061 4.405613 0.000000e+00 0.000000e+00
> comp373625_c1 10.996613 3.728574 0.000000e+00 0.000000e+00
> comp412635_c0 6.482673 4.519383 0.000000e+00 0.000000e+00
> comp421285_c2 -6.022731 5.016100 0.000000e+00 0.000000e+00
> comp404433_c0 12.027340 4.710831 3.172936e-312 1.152516e-308
> comp369941_c0 17.982161 9.936255 2.109743e-292 6.568535e-289
> comp423647_c1 -3.434699 5.607951 7.129247e-256 1.942185e-252
> comp431005_c0 3.384003 6.494423 7.465632e-247 1.807844e-243
> comp394477_c0 -13.722197 5.602823 9.993279e-244 2.177935e-240
>>
>
> MY INTENTION SO FAR HAS BEEN TO COMPARE THE GENOTYPES, E AND J, AT THE DIFFERENT TIME POINTS.
> SO FAR, SO GOOD (?)
>
> MOVING OVER TO THE GLM APPROACH:
>
>> z<-x
>> z<-DGEList(counts=x,group=Group)
>> z$samples
> group lib.size norm.factors
> E00R1 E 57811138 1
> E00R2 E 51440945 1
> E00R3 E 48826680 1
> E01R1 E 45632509 1
> E01R2 E 58231864 1
> E01R3 E 52760716 1
> E05R1 E 56470353 1
> E05R2 E 49744068 1
> E05R3 E 46002201 1
> E48R2 E 40331851 1
> E48R3 E 52134771 1
> E96R1 E 49449978 1
> E96R2 E 44233768 1
> E96R3 E 57043515 1
> J00R1 J 58919765 1
> J00R2 J 40388557 1
> J00R3 J 54126457 1
> J01R1 J 57302722 1
> J01R2 J 56813843 1
> J01R3 J 46711117 1
> J05R1 J 64733652 1
> J05R2 J 48922602 1
> J05R3 J 28696513 1
> J48R2 J 49544429 1
> J48R3 J 42298884 1
> J96R1 J 51538248 1
> J96R2 J 66995708 1
> J96R3 J 38325422 1
>>
>> z<-calcNormFactors(z)
>> z<-x
>> z<-DGEList(counts=z,group=Group)
>> z$samples
> group lib.size norm.factors
> E00R1 E 57811138 1
> E00R2 E 51440945 1
> E00R3 E 48826680 1
> E01R1 E 45632509 1
> E01R2 E 58231864 1
> E01R3 E 52760716 1
> E05R1 E 56470353 1
> E05R2 E 49744068 1
> E05R3 E 46002201 1
> E48R2 E 40331851 1
> E48R3 E 52134771 1
> E96R1 E 49449978 1
> E96R2 E 44233768 1
> E96R3 E 57043515 1
> J00R1 J 58919765 1
> J00R2 J 40388557 1
> J00R3 J 54126457 1
> J01R1 J 57302722 1
> J01R2 J 56813843 1
> J01R3 J 46711117 1
> J05R1 J 64733652 1
> J05R2 J 48922602 1
> J05R3 J 28696513 1
> J48R2 J 49544429 1
> J48R3 J 42298884 1
> J96R1 J 51538248 1
> J96R2 J 66995708 1
> J96R3 J 38325422 1
>>
>> z<-calcNormFactors(z)
>> Group<-factor(substring(colnames(z),1,1))
>> Group
> [1] E E E E E E E E E E E E E E J J J J J J J J J J J J J J
> Levels: E J
>>
>> Time<-factor(substring(colnames(z),2,3))
>> Time
> [1] 00 00 00 01 01 01 05 05 05 48 48 96 96 96 00 00 00 01 01 01 05 05 05 48 48 96 96 96
> Levels: 00 01 05 48 96
>>
>> design<-model.matrix(~Group+Group:Time)
>> rownames(design)<-colnames(z)
>> design
> (Intercept) GroupJ GroupE:Time01 GroupJ:Time01 GroupE:Time05 GroupJ:Time05 GroupE:Time48 GroupJ:Time48 GroupE:Time96 GroupJ:Time96
> E00R1 1 0 0 0 0 0 0 0 0 0
> E00R2 1 0 0 0 0 0 0 0 0 0
> E00R3 1 0 0 0 0 0 0 0 0 0
> E01R1 1 0 1 0 0 0 0 0 0 0
> E01R2 1 0 1 0 0 0 0 0 0 0
> E01R3 1 0 1 0 0 0 0 0 0 0
> E05R1 1 0 0 0 1 0 0 0 0 0
> E05R2 1 0 0 0 1 0 0 0 0 0
> E05R3 1 0 0 0 1 0 0 0 0 0
> E48R2 1 0 0 0 0 0 1 0 0 0
> E48R3 1 0 0 0 0 0 1 0 0 0
> E96R1 1 0 0 0 0 0 0 0 1 0
> E96R2 1 0 0 0 0 0 0 0 1 0
> E96R3 1 0 0 0 0 0 0 0 1 0
> J00R1 1 1 0 0 0 0 0 0 0 0
> J00R2 1 1 0 0 0 0 0 0 0 0
> J00R3 1 1 0 0 0 0 0 0 0 0
> J01R1 1 1 0 1 0 0 0 0 0 0
> J01R2 1 1 0 1 0 0 0 0 0 0
> J01R3 1 1 0 1 0 0 0 0 0 0
> J05R1 1 1 0 0 0 1 0 0 0 0
> J05R2 1 1 0 0 0 1 0 0 0 0
> J05R3 1 1 0 0 0 1 0 0 0 0
> J48R2 1 1 0 0 0 0 0 1 0 0
> J48R3 1 1 0 0 0 0 0 1 0 0
> J96R1 1 1 0 0 0 0 0 0 0 1
> J96R2 1 1 0 0 0 0 0 0 0 1
> J96R3 1 1 0 0 0 0 0 0 0 1
> attr(,"assign")
> [1] 0 1 2 2 2 2 2 2 2 2
> attr(,"contrasts")
> attr(,"contrasts")$Group
> [1] "contr.treatment"
>
> attr(,"contrasts")$Time
> [1] "contr.treatment"
>
>> z<-estimateGLMCommonDisp(z,design)
>>
>> # genewise
>> z<-estimateGLMTrendedDisp(z,design)
>> z<-estimateGLMTagwiseDisp(z,design)
>>
>> fit<-glmFit(z,design)
>> lrt<-glmLRT(fit,coef=2)
>> topTags(lrt)
> Coefficient: GroupJ
> logFC logCPM LR PValue FDR
> comp394477_c0 -15.320108 5.602823 1599.742 0.000000e+00 0.000000e+00
> comp413538_c1 -11.668893 4.694669 2063.236 0.000000e+00 0.000000e+00
> comp431566_c0 -11.598820 4.405613 1505.514 0.000000e+00 0.000000e+00
> comp404433_c0 8.789010 4.710831 1614.360 0.000000e+00 0.000000e+00
> comp412635_c0 6.325157 4.519383 1638.631 0.000000e+00 0.000000e+00
> comp431005_c0 4.235436 6.494423 1572.862 0.000000e+00 0.000000e+00
> comp421285_c2 -4.743959 5.016100 1273.102 7.917312e-279 2.464999e-275
> comp373625_c1 6.892714 3.728574 1255.435 5.468549e-275 1.489769e-271
> comp419056_c0 13.351686 2.801532 1103.131 6.893184e-242 1.669223e-238
> comp401947_c0 7.983599 2.858410 1005.328 1.247665e-220 2.719160e-217
>>
>> # testing the Group DE at 01h
>> lrt<-glmLRT(fit,contrast=c(0,0,-1,1,0,0,0,0,0,0))
>> topTags(lrt)
> Coefficient: -1*GroupE:Time01 1*GroupJ:Time01
> logFC logCPM LR PValue FDR
> comp401050_c0 3.493540 4.889617 169.62289 8.943982e-39 1.949251e-34
> comp427887_c0 2.494577 6.359053 151.68090 7.439934e-35 8.107296e-31
> comp429669_c1 3.461429 7.202470 139.75242 3.015465e-32 2.190635e-28
> comp425652_c1 -3.747270 5.578305 121.36647 3.176789e-28 1.467673e-24
> comp431260_c0 -1.510660 6.227693 121.25102 3.367149e-28 1.467673e-24
> comp428540_c0 2.458811 4.676311 95.33553 1.607026e-22 5.837254e-19
> comp437641_c0 1.729965 5.735428 89.12381 3.708599e-21 1.010451e-17
> comp415376_c0 -3.768543 1.566064 89.12355 3.709096e-21 1.010451e-17
> comp419207_c0 -2.096465 7.000344 88.31376 5.585358e-21 1.352525e-17
> comp428866_c0 1.503459 4.857282 85.41288 2.421386e-20 5.277168e-17
>>
>> # testing the Group DE at 05h
>> lrt<-glmLRT(fit,contrast=c(0,0,0,0,-1,1,0,0,0,0))
>> topTags(lrt)
> Coefficient: -1*GroupE:Time05 1*GroupJ:Time05
> logFC logCPM LR PValue FDR
> comp436562_c0 3.559768 8.677505 245.0805 3.068810e-55 6.688165e-51
> comp419735_c0 7.941315 6.207378 200.3481 1.753344e-45 1.910619e-41
> comp401050_c0 3.464120 4.889617 166.9922 3.358441e-38 2.439795e-34
> comp427887_c0 2.392173 6.359053 139.7174 3.069165e-32 1.672235e-28
> comp429669_c1 3.377087 7.202470 133.3528 7.569234e-31 2.972487e-27
> comp415376_c0 -4.542037 1.566064 133.1979 8.183411e-31 2.972487e-27
> comp425652_c1 -3.682664 5.578305 117.3181 2.445272e-27 7.613181e-24
> comp430561_c0 -2.003697 5.119557 108.4644 2.126367e-25 5.531729e-22
> comp433200_c0 2.289307 4.954039 108.3223 2.284370e-25 5.531729e-22
> comp429162_c0 1.995171 3.402911 107.9965 2.692502e-25 5.868040e-22
>>
>> # testing the Group DE at 48h
>> lrt<-glmLRT(fit,contrast=c(0,0,0,0,0,0,-1,1,0,0))
>> topTags(lrt)
> Coefficient: -1*GroupE:Time48 1*GroupJ:Time48
> logFC logCPM LR PValue FDR
> comp369941_c0 18.541426 9.936255 823.6359 3.919895e-181 8.543020e-177
> comp417702_c0 -3.622411 3.990389 207.6927 4.377967e-47 4.770670e-43
> comp417690_c0 -3.347656 5.239097 176.9797 2.212611e-40 1.607388e-36
> comp436283_c0 -3.851908 1.834438 164.1105 1.430948e-37 7.796518e-34
> comp417686_c0 -3.156963 3.568423 141.8250 1.062032e-32 4.629184e-29
> comp417690_c1 -3.248463 4.347931 129.4769 5.333285e-30 1.937227e-26
> comp423176_c0 -2.346243 5.066883 125.6650 3.640327e-29 1.133390e-25
> comp431260_c0 -1.695085 6.227693 121.4132 3.102817e-28 8.452849e-25
> comp419207_c0 -2.760753 7.000344 119.4983 8.146313e-28 1.972675e-24
> comp435553_c0 1.785259 5.622214 110.4265 7.902139e-26 1.722192e-22
>>
>> # testing the Group DE at 96h
>> lrt<-glmLRT(fit,contrast=c(0,0,0,0,0,0,0,0,-1,1))
>> topTags(lrt)
> Coefficient: -1*GroupE:Time96 1*GroupJ:Time96
> logFC logCPM LR PValue FDR
> comp369941_c0 17.174042 9.936255 978.8875 6.972860e-215 1.519665e-210
> comp431260_c0 -2.005239 6.227693 211.6925 5.869844e-48 6.396369e-44
> comp417702_c0 -2.822779 3.990389 155.0349 1.375824e-35 9.994901e-32
> comp423176_c0 -2.224759 5.066883 141.4204 1.302007e-32 7.093983e-29
> comp419207_c0 -2.629442 7.000344 135.4855 2.585533e-31 1.126982e-27
> comp435199_c0 -1.941443 6.089785 134.1832 4.981873e-31 1.809582e-27
> comp428866_c0 1.871781 4.857282 132.4752 1.177677e-30 3.666612e-27
> comp437434_c0 -1.928556 5.025404 122.7183 1.607256e-28 4.378567e-25
> comp437540_c1 -1.719987 7.634490 122.0333 2.269992e-28 5.496911e-25
> comp424275_c1 2.829585 2.058209 113.6028 1.592057e-26 3.469729e-23
>
> SO, EXCEPT FOR THE FIRST TIME POINT (00) THE COMPARISONS ARE NOT
> SIMILAR. I WOULD VERY MUCH RECEIVE COMMENTS ON THIS AND IN PARTICULAR
> INDICATION(S) ON WHERE I ERR.
>
> THANK YOU.
> JAHN
>
>
>
>
> -- output of sessionInfo():
>
>> sessionInfo()
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=Norwegian (Bokm??l)_Norway.1252 LC_CTYPE=Norwegian (Bokm??l)_Norway.1252 LC_MONETARY=Norwegian (Bokm??l)_Norway.1252 LC_NUMERIC=C
> [5] LC_TIME=Norwegian (Bokm??l)_Norway.1252
>
> attached base packages:
> [1] splines parallel stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] edgeR_3.4.2 limma_3.18.6 DESeq_1.14.0 lattice_0.20-24 locfit_1.5-9.1 Biobase_2.22.0 BiocGenerics_0.8.0
>
> loaded via a namespace (and not attached):
> [1] annotate_1.40.0 AnnotationDbi_1.24.0 DBI_0.2-7 genefilter_1.44.0 geneplotter_1.40.0 grid_3.0.2 IRanges_1.20.6 lme4_1.0-5
> [9] MASS_7.3-29 Matrix_1.1-1.1 minqa_1.2.2 nlme_3.1-113 RColorBrewer_1.0-5 RSQLite_0.11.4 stats4_3.0.2 survival_2.37-4
> [17] tools_3.0.2 XML_3.98-1.1 xtable_1.7-1
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list