[BioC] DESeq2 missing columns
Michael Love
michaelisaiahlove at gmail.com
Tue Jul 22 14:02:49 CEST 2014
hi Carsten,
As DESeq2 is a more general model, the results tables don't
necessarily involve only two conditions A and B, so to have the
results table have consistent columns, we excluded these. Here is the
code to reproduce the base means per condition:
baseMeanA = rowMeans(counts(dds,normalized=TRUE)[,dds$condition == "A"])
baseMeanB = rowMeans(counts(dds,normalized=TRUE)[,dds$condition == "B"])
res = cbind(baseMeanA, baseMeanB, as.data.frame(res))
To force a header for the first column when writing out, you could add
a column explicitly:
res = cbind(genes=rownames(res), as.data.frame(res))
And then use row.names=FALSE when writing out.
Mike
On Mon, Jul 21, 2014 at 4:59 AM, Kuenne, Carsten
<Carsten.Kuenne at mpi-bn.mpg.de> wrote:
> Dear mailing list,
>
>
>
> I recently found out about the new DESeq version and am currently in the process of switching my pipelines to use it. You included some sweet new features and those may simplify my life a lot. Now my questions relate to the output tables bearing the final differently expressed genes.
>
>
>
> 1. Is there a plan to reintroduce the old DESeq 1 columns baseMeanA and baseMeanB? Or do you have some R code available to create a complete table DESeq-1-style? These two columns may be implicitly included in other values but it was just terribly useful to have them separately.
>
> 2. If the output table is written as a tab-delimited file instead of a CSV, the first column headline is removed (since it is empty) resulting in wrong headers for all columns. See the last line of the code below. Maybe you could include some header for this column ("id" or "feature" or something similar) to prevent this behaviour?
>
>
>
> [...]
>
> data = read.table("counts.matrix", header=T, row.names=1, com='')
>
> col_ordering = c(1,2,3,4)
>
> rnaseqMatrix = data[,col_ordering]
>
> rnaseqMatrix = round(rnaseqMatrix)
>
> rnaseqMatrix = rnaseqMatrix[rowSums(rnaseqMatrix)>=2,]
>
> conditions = data.frame(conditions=factor(c(rep("nha", 2), rep("nhc", 2))))
>
> rownames(conditions) = colnames(rnaseqMatrix)
>
> ddsFullCountTable <- DESeqDataSetFromMatrix(
>
> countData = rnaseqMatrix,
>
> colData = conditions,
>
> design = ~ conditions)
>
> dds = DESeq(ddsFullCountTable)
>
> res = results(dds)
>
> write.table(as.data.frame(res[order(res$pvalue),]), file=deseq2_nha_nhc.txt', sep=' ', quote=FALSE, row.names=T)
>
> [...]
>
>
>
> [deseq2_nha_nhc.txt]
>
> baseMean log2FoldChange lfcSE pvalue padj
>
> comp277602_c0_seq1 438.902806642428 5.58864608712407 0.302717563529963 4.20808210669443e-76 3.08662822526037e-71
>
> [...]
>
>
>
> Anyway thanks for the great update!
>
>
>
> Yours hopefully,
>
> Carsten
>
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list