[BioC] DESeq2 missing columns

Michael Love michaelisaiahlove at gmail.com
Tue Jul 22 14:02:49 CEST 2014


hi Carsten,

As DESeq2 is a more general model, the results tables don't
necessarily involve only two conditions A and B, so to have the
results table have consistent columns, we excluded these. Here is the
code to reproduce the base means per condition:

baseMeanA = rowMeans(counts(dds,normalized=TRUE)[,dds$condition == "A"])
baseMeanB = rowMeans(counts(dds,normalized=TRUE)[,dds$condition == "B"])
res = cbind(baseMeanA, baseMeanB, as.data.frame(res))

To force a header for the first column when writing out, you could add
a column explicitly:

res = cbind(genes=rownames(res), as.data.frame(res))

And then use row.names=FALSE when writing out.

Mike


On Mon, Jul 21, 2014 at 4:59 AM, Kuenne, Carsten
<Carsten.Kuenne at mpi-bn.mpg.de> wrote:
> Dear mailing list,
>
>
>
> I recently found out about the new DESeq version and am currently in the process of switching my pipelines to use it. You included some sweet new features and those may simplify my life a lot. Now my questions relate to the output tables bearing the final differently expressed genes.
>
>
>
> 1.      Is there a plan to reintroduce the old DESeq 1 columns baseMeanA and baseMeanB? Or do you have some R code available to create a complete table DESeq-1-style? These two columns may be implicitly included in other values but it was just terribly useful to have them separately.
>
> 2.      If the output table is written as a tab-delimited file instead of a CSV, the first column headline is removed (since it is empty) resulting in wrong headers for all columns. See the last line of the code below. Maybe you could include some header for this column ("id" or "feature" or something similar) to prevent this behaviour?
>
>
>
> [...]
>
> data = read.table("counts.matrix", header=T, row.names=1, com='')
>
> col_ordering = c(1,2,3,4)
>
> rnaseqMatrix = data[,col_ordering]
>
> rnaseqMatrix = round(rnaseqMatrix)
>
> rnaseqMatrix = rnaseqMatrix[rowSums(rnaseqMatrix)>=2,]
>
> conditions = data.frame(conditions=factor(c(rep("nha", 2), rep("nhc", 2))))
>
> rownames(conditions) = colnames(rnaseqMatrix)
>
> ddsFullCountTable <- DESeqDataSetFromMatrix(
>
>     countData = rnaseqMatrix,
>
>     colData = conditions,
>
>     design = ~ conditions)
>
> dds = DESeq(ddsFullCountTable)
>
> res = results(dds)
>
> write.table(as.data.frame(res[order(res$pvalue),]), file=deseq2_nha_nhc.txt', sep='              ', quote=FALSE, row.names=T)
>
> [...]
>
>
>
> [deseq2_nha_nhc.txt]
>
> baseMean           log2FoldChange lfcSE      pvalue   padj
>
> comp277602_c0_seq1    438.902806642428          5.58864608712407          0.302717563529963               4.20808210669443e-76  3.08662822526037e-71
>
> [...]
>
>
>
> Anyway thanks for the great update!
>
>
>
> Yours hopefully,
>
> Carsten
>
>
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list