[BioC] finding a very large number of false positives using edgeR
Gordon K Smyth
smyth at wehi.EDU.AU
Fri Jan 17 00:05:23 CET 2014
Dear Charles,
The obvious conclusion is that these are not false positives and there are
genuine batch differences between your runs. Evidence trumps theory.
The existence of batch effects is common in genomic practice.
Best wishes
Gordon
> Date: Wed, 15 Jan 2014 23:07:49 +0000
> From: "Blum, Charles" <CBlum at mednet.ucla.edu>
> To: "bioconductor at stat.math.ethz.ch" <bioconductor at stat.math.ethz.ch>
> Subject: [BioC] finding a very large number of false positives using
> edgeR
>
> Hi,
>
> I am running edgeR on 6 RNAseq samples that were generated using the
> exact same protocol but are from different Illumina project runs.
> In theory, no genes should be differentially expressed. Nevertheless,
> edgeR identifies almost 7,000 genes as DE at a FDR rate of 0.1. This is
> very puzzling.
>
> I ran edgeR using the classic approach (exactTest) and the glm approach.
>
> To get an idea of sequencing depth:
> Sample: Project1_sample1 Project1_sample2 Project1_sample3 Project2_sample1 Project2_sample2 Project2_sample3
> Total unique annotated read counts: 41,440,190 26,429,859 29,655,944 25,423,167 30,914,059 35,41,714
>
> Could it be due to the variability in sequencing depth between projects?
> Could there anything else in the data or analysis that could violate any assumptions made by edgeR?
> Is there any known problems with the newest version of edgeR?
>
>> sessionInfo()
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] splines parallel stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] GenomicFeatures_1.14.2 AnnotationDbi_1.24.0 GenomicRanges_1.14.4 XVector_0.2.0
> [5] IRanges_1.20.6 biomaRt_2.18.0 edgeR_3.4.2 limma_3.18.7
> [9] DESeq_1.14.0 lattice_0.20-24 locfit_1.5-9.1 Biobase_2.22.0
> [13] BiocGenerics_0.8.0 gplots_2.12.1 MASS_7.3-29 heatmap.plus_1.3
>
> loaded via a namespace (and not attached):
> [1] annotate_1.40.0 Biostrings_2.30.1 bitops_1.0-6 BSgenome_1.30.0 caTools_1.16
> [6] DBI_0.2-7 gdata_2.13.2 genefilter_1.44.0 geneplotter_1.40.0 grid_3.0.2
> [11] gtools_3.1.1 KernSmooth_2.23-10 RColorBrewer_1.0-5 RCurl_1.95-4.1 Rsamtools_1.14.2
> [16] RSQLite_0.11.4 rtracklayer_1.22.0 stats4_3.0.2 survival_2.37-4 tools_3.0.2
> [21] XML_3.95-0.2 xtable_1.7-1 zlibbioc_1.8.0
>
>
>> packageDescription('edgeR')$Maintainer
> [1] "Mark Robinson <mark.robinson at imls.uzh.ch>, Davis McCarthy\n<dmccarthy at wehi.edu.au>, Yunshun Chen <yuchen at wehi.edu.au>,\nGordon Smyth <smyth at wehi.edu.au>"
>
> Thanks,
> Charles
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list