[BioC] finding a very large number of false positives using edgeR
Steve Lianoglou
lianoglou.steve at gene.com
Thu Jan 16 00:59:16 CET 2014
Hi,
On Wed, Jan 15, 2014 at 3:07 PM, Blum, Charles <CBlum at mednet.ucla.edu> wrote:
> Hi,
>
> I am running edgeR on 6 RNAseq samples that were generated using the exact same protocol but are from different Illumina project runs.
> In theory, no genes should be differentially expressed. Nevertheless, edgeR identifies almost 7,000 genes as DE at a FDR rate of 0.1. This is very puzzling.
>
> I ran edgeR using the classic approach (exactTest) and the glm approach.
>
> To get an idea of sequencing depth:
> Sample: Project1_sample1 Project1_sample2 Project1_sample3 Project2_sample1 Project2_sample2 Project2_sample3
> Total unique annotated read counts: 41,440,190 26,429,859 29,655,944 25,423,167 30,914,059 35,41,714
>
> Could it be due to the variability in sequencing depth between projects?
Shouldn't be such a big issue -- even the differences between sample
sized you see here are not very large.
> Could there anything else in the data or analysis that could violate any assumptions made by edgeR?
> Is there any known problems with the newest version of edgeR?
My guess would be "no" -- you could, of course, try the same analysis
with limma::voom or DESeq2 to see, but ..
Anyway, could you show us the code you used to do the analysis -- the
design matrix would be of particular interest along with the
coefs/contrasts you are testing, but the whole (relevant) code would
be good (ie. from DGEList -> dispersion estimation -> design matrix
setup -> and the farious *fit + *table functions).
Are you simply testing differential expression between the replicates
of Project1 and those of Project2? Presumably your issue is that these
are libraries sequenced from what you expect to be the same type of
sample/tissue/cell-line/whatever?
Perhaps encoding the "batch" (projectID) as another covariate into
your design could help mitigate these issues, but I'm not sure what
samples you're testing against what, so can't say anything for sure.
-steve
--
Steve Lianoglou
Computational Biologist
Genentech
More information about the Bioconductor
mailing list