[BioC] DESeq2 - factor base level changes the DE genes.
Ximena [guest]
guest at bioconductor.org
Tue Jul 15 18:35:40 CEST 2014
Hello,
I have an RNAseq experiment from two strains of mice, that have been exposed to two different environments (three biological replicates of each combination). So my experimental setup is
> colData
genetics environment
black2 B6 black
black3 B6 black
black5 B6 black
black1 B6 yellow
black4 B6 yellow
black6 B6 yellow
agouti1 129 yellow
agouti4 129 yellow
agouti6 129 yellow
agouti2 129 black
agouti3 129 black
agouti5 129 black
I want to test which genes change their expression when the environment changes, controlling for the genetic background. I am using DESeq2 for this, with the formula ~ genetics + environment + genetics:environment.
However, the choice of the base level for the factors in my experimental design changes the genes that are significant. If I leave the default reference levels (129 for genetics and black for env) I get more DE genes that if I set as reference B6 for genetics and yellow for environemnt.
I thought the reference level is used to decide how to calculate the fold change and is necessary to correctly interpret the coefficients, but shouldn't the results be the same?
Any thoughts on what am I missing?
Thank you very much in advance.
My code is below.
## 1 ##
> levels(colData$genetics)
[1] "129" "B6"
> levels(colData$environment)
[1] "black" "yellow"
> dds <- DESeqDataSetFromMatrix(countData = data[,8:19], colData = colData, design = ~ genetics + environment + genetics:environment)
> dds <- DESeq(dds)
> resG <- results(dds, alpha=0.05, name="genetics_B6_vs_129")
> resE <- results(dds, alpha=0.05, name="environment_yellow_vs_black")
> resGE <- results(dds, alpha=0.05, name="geneticsB6.environmentyellow")
> dim(subset(resG, resG$padj < 0.05))
[1] 3675 6
> dim(subset(resE, resE$padj < 0.05))
[1] 20 6
> dim(subset(resGE, resGE$padj < 0.05))
[1] 24 6
## 2 ##
> colData2 <- colData
> colData2$genetics <- relevel(colData2$genetics, ref="B6")
> colData2$environment <- relevel(colData2$environment, ref="yellow")
> levels(colData2$genetics)
[1] "B6" "129"
> levels(colData2$environment)
[1] "yellow" "black"
> dds2 <- DESeqDataSetFromMatrix(countData = data[,8:19], colData = colData2, design = ~ genetics + environment + genetics:environment)
> dds2 <- DESeq(dds2)
> res2G <- results(dds2, alpha=0.05, name="genetics_129_vs_B6")
> res2E <- results(dds2, alpha=0.05, name="environment_black_vs_yellow")
> res2GE <- results(dds2, alpha=0.05, name="genetics129.environmentblack")
> dim(subset(res2G, resG$padj < 0.05))
[1] 3127 6
> dim(subset(res2E, resE$padj < 0.05))
[1] 2 6
> dim(subset(res2GE, resGE$padj < 0.05))
[1] 0 6
##
> g1 <- subset(resG, resG$padj < 0.05)
> g2 <- subset(res2G, res2G$padj < 0.05)
> length(intersect(rownames(g1), rownames(g2)))
[1] 2640
-- output of sessionInfo():
R version 3.0.2 (2013-09-25)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] DESeq2_1.2.10 RcppArmadillo_0.4.320.0 Rcpp_0.11.2
[4] GenomicRanges_1.14.4 XVector_0.2.0 IRanges_1.20.7
[7] BiocGenerics_0.8.0
loaded via a namespace (and not attached):
[1] annotate_1.40.1 AnnotationDbi_1.24.0 Biobase_2.22.0 DBI_0.2-7
[5] genefilter_1.44.0 grid_3.0.2 lattice_0.20-23 locfit_1.5-9.1
[9] RColorBrewer_1.0-5 RSQLite_0.11.4 splines_3.0.2 stats4_3.0.2
[13] survival_2.37-7 tools_3.0.2 XML_3.95-0.2 xtable_1.7-3
--
Sent via the guest posting facility at bioconductor.org.
More information about the Bioconductor
mailing list