[BioC] Fwd: Outliers in RNA seq analysis using DESeq2

Wed May 8 17:24:50 CEST 2013

Hi

I've conducted a 2 condition RNA seq experiment using "disease" versus
"control" cells. I have 16 biological replicates in my disease group and 11
in my control. I'm using DESeq2 v 1.0.9 for the analysis.

>From the heatmap and pca plots (attached)  its clear that there's some
variability amongst the biological replicates in my groups which id expect,
but also 6 of my disease samples seem to cluster closely with the controls.
All of the samples in each group were prepared in the same way and
sequenced together and I can't identify any obvious batch effect that could
be contributing to this.

I don't have much experience analysing this kind of data and my statistics
knowledge is also unfortunately somewhat lacking but I'm wondering if
anyone has any experience with regards how well biological replicates from
RNA seq data usually cluster together?  I'm not sure if its more
appropriate to drop these 6 samples and continue the analysis with 10 V 11
in each group or leave them in as perhaps this is more
representative of variability of the disease biology.

I'd appreciate any advice anyone has! Thanks in advance

Emma