[Bioc-devel] DESeqDataSetFromMatrix Changes Column Names

Michael Love michaelisaiahlove at gmail.com
Tue Aug 26 10:43:44 CEST 2014

hi Dario,

Here's some example behavior of SummarizedExperiment (here in devel).

The renaming behavior is coming from GenomicRanges. Anyway I can't
avoid the duplication of memory in the case of a conflict of colnames
of the matrix and the rownames of colData, unless I internally
overwrite the rownames of colData. But I don't think I would do this
because the standard is to let the colData take precedence.

watch the Vcells (used):

m = matrix(rnorm(5e6),ncol=100,dimnames=list(1:5e4,paste0("a",1:100)))
gc() # 40 Mb or so taken by m

se = SummarizedExperiment(m)
gc() # no duplication after creating se

se = SummarizedExperiment(m,
colnames(se) # colData takes precedence of colnames of se
colnames(assay(se)) # and of the colnames of m
gc() # note a duplication,
# because the colnames of the matrix in assay() were replaced

se = SummarizedExperiment(m, colData=DataFrame(x=1:100,row.names=colnames(m)))
gc() # no duplication, same names.
# so you can use this code to insist that
# the colnames of the DESeqDataSet come from the counts matrix

m = matrix(rnorm(5e6),ncol=100)
se = SummarizedExperiment(m, colData=DataFrame(x=1:100))
gc() # no duplication if m has no colnames going in

R Under development (unstable) (2014-06-05 r65862)
 Platform: x86_64-apple-darwin12.5.0 (64-bit)

 [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

 attached base packages:
 [1] parallel  stats     graphics  grDevices datasets  utils     methods
 [8] base

 other attached packages:
 [1] GenomicRanges_1.17.35 GenomeInfoDb_1.1.18   IRanges_1.99.24
 [4] S4Vectors_0.1.2       BiocGenerics_0.11.4   devtools_1.5
 [7] slidify_0.4.5         knitr_1.6             BiocInstaller_1.15.5

 loaded via a namespace (and not attached):
  [1] compiler_3.2.0 digest_0.6.4   evaluate_0.5.5 formatR_0.10   httr_0.4
  [6] markdown_0.7.2 memoise_0.2.1  RCurl_1.95-4.3 stats4_3.2.0   stringr_0.6.2
 [11] tools_3.2.0    whisker_0.3-2  XVector_0.5.7  yaml_2.1.13

On Tue, Aug 26, 2014 at 2:00 AM, Dario Strbenac
<dstr7320 at uni.sydney.edu.au> wrote:
> I am using the latest release version. I understand your recommendation about colData and will use it.
> --------------------------------------
>  Dario Strbenac
>  PhD Student
>  University of Sydney
>  Camperdown NSW 2050
>  Australia
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

More information about the Bioc-devel mailing list