[BioC] finding a very large number of false positives using edgeR

Fri Jan 17 00:05:23 CET 2014

Dear Charles,

The obvious conclusion is that these are not false positives and there are 
genuine batch differences between your runs.  Evidence trumps theory.

The existence of batch effects is common in genomic practice.

Best wishes
Gordon

> Date: Wed, 15 Jan 2014 23:07:49 +0000
> From: "Blum, Charles" <CBlum at mednet.ucla.edu>
> To: "bioconductor at stat.math.ethz.ch" <bioconductor at stat.math.ethz.ch>
> Subject: [BioC] finding a very large number of false positives using
> 	edgeR
>
> Hi,
>
> I am running edgeR on 6 RNAseq samples that were generated using the 
> exact same protocol but are from different Illumina project runs.

> In theory, no genes should be differentially expressed. Nevertheless, 
> edgeR identifies almost 7,000 genes as DE at a FDR rate of 0.1. This is 
> very puzzling.
>
> I ran edgeR using the classic approach (exactTest)  and the glm approach.
>
> To get an idea of sequencing depth:
> Sample:                                                    Project1_sample1  Project1_sample2      Project1_sample3    Project2_sample1    Project2_sample2    Project2_sample3
> Total unique annotated read counts:             41,440,190               26,429,859                  29,655,944                  25,423,167               30,914,059                   35,41,714
>
> Could it be due to the variability in sequencing depth between projects?
> Could there anything else in the data or analysis that could violate any assumptions made by edgeR?
> Is there any known problems with the newest version of edgeR?
>
>> sessionInfo()
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] splines   parallel  stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] GenomicFeatures_1.14.2 AnnotationDbi_1.24.0   GenomicRanges_1.14.4   XVector_0.2.0
> [5] IRanges_1.20.6         biomaRt_2.18.0         edgeR_3.4.2            limma_3.18.7
> [9] DESeq_1.14.0           lattice_0.20-24        locfit_1.5-9.1         Biobase_2.22.0
> [13] BiocGenerics_0.8.0     gplots_2.12.1          MASS_7.3-29            heatmap.plus_1.3
>
> loaded via a namespace (and not attached):
> [1] annotate_1.40.0    Biostrings_2.30.1  bitops_1.0-6       BSgenome_1.30.0    caTools_1.16
> [6] DBI_0.2-7          gdata_2.13.2       genefilter_1.44.0  geneplotter_1.40.0 grid_3.0.2
> [11] gtools_3.1.1       KernSmooth_2.23-10 RColorBrewer_1.0-5 RCurl_1.95-4.1     Rsamtools_1.14.2
> [16] RSQLite_0.11.4     rtracklayer_1.22.0 stats4_3.0.2       survival_2.37-4    tools_3.0.2
> [21] XML_3.95-0.2       xtable_1.7-1       zlibbioc_1.8.0
>
>
>> packageDescription('edgeR')$Maintainer
> [1] "Mark Robinson <mark.robinson at imls.uzh.ch>, Davis McCarthy\n<dmccarthy at wehi.edu.au>, Yunshun Chen <yuchen at wehi.edu.au>,\nGordon Smyth <smyth at wehi.edu.au>"
>
> Thanks,
> Charles

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}