[BioC] DESeq2 vs limma/voom: PCA completely different. How to proceed?
Paul Geeleher
paulgeeleher at gmail.com
Tue Apr 23 20:26:48 CEST 2013
Hey Stephen,
A quick look suggests that your PC1 are actually very similar (look at
only the X-axis on both of the PCA plots). My guess is that what was
PC2 in one plot has become PC3 or PC4 etc. in the other plot. As you
have only plotted PC1 and 2 the plots look completely different. Maybe
try plotting more than 2 PCs to see if this is the case.
Paul.
On Tue, Apr 23, 2013 at 1:06 PM, Steve Lianoglou
<mailinglist.honeypot at gmail.com> wrote:
> Hi Stephen,
>
> Later on in the DESeq2 (and original DESeq) vignette, the authors
> recommend shooting your counts through some type of variance
> stabilizing transformation (either the rlog or vst) for exploratory
> analysis of the type you are performing.
>
> Perhaps you can try that and see how it compares.
>
> HTH,
> -steve
>
>
> On Tue, Apr 23, 2013 at 10:10 AM, Stephen Turner <vustephen at gmail.com> wrote:
>> Short story: I ran a PCA on a matrix of counts from an RNA-seq
>> experiment and then using the voom-tramsformed data, and they're
>> completely different. I imagine my results will be very different
>> using DESeq2 vs limma, and I am wondering how to proceed. Here are the
>> two PCA biplots:
>>
>> http://imgur.com/a/DBRfE
>>
>> More background:
>>
>> I have an RNA-seq experiment with 5 conditions: WT (n=4), and mutants
>> A, B, C, and D (all n=3). I aligned the reads with STAR, counted reads
>> mapping to genes using HTSeq-count. I imported the count data into
>> DESeq2 and processed using the functions described in the vignette,
>> DESeqDataSetFromHTSeqCount() and DESeq().
>>
>> I performed a PCA on the transposed normalized counts table from the
>> DESeq Data Set (dds) object (note the outlier, which is no longer
>> outlying in the voom PCA plot below):
>>
>> pca <- as.data.frame(prcomp(t(na.omit(counts(dds, normalized=TRUE))))$x)
>> col <- 1:5
>> groups <- colData(dds)$condition
>> plot(PC2~PC1, data=pca, bg=col[groups], pch=21, main="DESeq2")
>>
>> Here's that PCA biplot:
>>
>> http://i.imgur.com/P5hhE9Q.png
>>
>> I then used limma/voom to transform the data, and performed another
>> PCA, which looked completely different.
>>
>> design <- model.matrix(~0+colData(dds)$condition)
>> v <- voom(counts=counts(dds, normalized=TRUE), design=design)
>> pca <- as.data.frame(prcomp(t(na.omit(v$E)))$x)
>> pca(PC2~PC1, data=pca, bp=col[groups], pch=21, main="voom")
>>
>> Here's that PCA (notice the outlier C1 is no longer outlying):
>>
>> http://i.imgur.com/oKlnWh3.png
>>
>> Based on this PCA I imagine my results will look very different using
>> DESeq2 or limma on the voom'd data. I understand that this is due to
>> the PCA on the voom transformation is only considering the normalized
>> log2 expression values, and not considering the inverse variance
>> weights. Now I'm just not sure how to proceed with downstream analysis
>> (or quality control with the voom transformation, in general). Any
>> guidance is highly appreciated.
>>
>> Thanks,
>> Stephen
>>
>> ...
>>
>>> sessionInfo()
>> R version 3.0.0 (2013-04-03)
>> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>>
>> locale:
>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>
>> attached base packages:
>> [1] parallel stats graphics grDevices utils datasets
>> methods base
>>
>> other attached packages:
>> [1] limma_3.16.1 calibrate_1.7.1 MASS_7.3-26
>> [4] DESeq2_1.0.8 RcppArmadillo_0.3.800.1 Rcpp_0.10.3
>> [7] lattice_0.20-15 Biobase_2.20.0 GenomicRanges_1.12.2
>> [10] IRanges_1.18.0 BiocGenerics_0.6.0 BiocInstaller_1.10.0
>>
>> loaded via a namespace (and not attached):
>> [1] annotate_1.38.0 AnnotationDbi_1.22.2 DBI_0.2-5
>> [4] genefilter_1.42.0 grid_3.0.0 locfit_1.5-9
>> [7] RColorBrewer_1.0-5 RSQLite_0.11.3 splines_3.0.0
>> [10] stats4_3.0.0 survival_2.37-4 tools_3.0.0
>> [13] XML_3.95-0.2 xtable_1.7-1
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>
> --
> Steve Lianoglou
> Computational Biologist
> Department of Bioinformatics and Computational Biology
> Genentech
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Dr. Paul Geeleher, PhD (Bioinformatics)
Section of Hematology-Oncology
Department of Medicine
The University of Chicago
900 E. 57th St.,
KCBD, Room 7144
Chicago, IL 60637
--
www.bioinformaticstutorials.com
More information about the Bioconductor
mailing list