[Bioc-devel] change names(assays(SummarizedExperiment)) w/o copy?
Martin Morgan
mtmorgan at fhcrc.org
Wed May 7 22:06:25 CEST 2014
On 05/07/2014 12:06 PM, Michael Love wrote:
> hi,
>
> Is there a way that I can change the names of the assays slot of a
> SummarizedExperiment, without making a new copy of the data contained
> within? Assume I get an SE which has already been constructed, but no
> names on the assays() SimpleList.
Hi Mike --
names(assays(se)) = "counts"
extracts the assays from se, then applies the names to the SimpleList, then
re-assigns the SimpleList to the SummarizedExperiment. The memory copy (of big
data) is actually in the extraction assays(se)
> m = matrix(0, 0, 0); tracemem(m)
[1] "<0x3449b4e8>"
> se = SummarizedExperiment(m)
> a = assays(se)
tracemem[0x3449b4e8 -> 0x34ef64f0]: lapply lapply lapply lapply endoapply
endoapply assays assays
which can actually be avoided by asking for the assays without their dimnames
> a = assays(se, withDimnames=FALSE)
>
and from there
names(a) = "counts"
assays(se) = a
verifying that we haven't actually copied the matrix
> .Internal(inspect(assays(se, withDimnames=FALSE)[[1]]))
@3449b4e8 14 REALSXP g0c0 [NAM(2),TR,ATT] (len=0, tl=0)
ATTRIB:
@3449b4b0 02 LISTSXP g0c0 []
TAG: @b9c778 01 SYMSXP g0c0 [LCK,gp=0x4000] "dim" (has value)
@3449a118 13 INTSXP g0c1 [NAM(2)] (len=2, tl=0) 0,0
> .Internal(inspect(m))
@3449b4e8 14 REALSXP g0c0 [NAM(2),TR,ATT] (len=0, tl=0)
ATTRIB:
@3449b4b0 02 LISTSXP g0c0 []
TAG: @b9c778 01 SYMSXP g0c0 [LCK,gp=0x4000] "dim" (has value)
@3449a118 13 INTSXP g0c1 [NAM(2)] (len=2, tl=0) 0,0
One would hope (a) that I'd followed through on a previous promise to just apply
the dimnames up-front, so that there is no need to use withDimnames=FALSE to
avoid the copying (there might have been a price on the way in) and (b) that the
following would work
names(assays(se, withDimnames=FALSE)) = "counts"
it didn't
> names(assays(se, withDimnames=FALSE)) = "counts"
Error in slot(x, nm) :
no slot of name "withDimnames" for this object of class "SummarizedExperiment"
but does in 1.17.13
Martin
>
> thanks,
>
> Mike
>
>> library(GenomicRanges)
> > gc()
> used (Mb) gc trigger (Mb) max used (Mb)
> Ncells 1291106 69 1710298 91.4 1590760 85.0
> Vcells 1178619 9 1925843 14.7 1724123 13.2
> > m <- matrix(1:2e7, ncol=10)
> > gc()
> used (Mb) gc trigger (Mb) max used (Mb)
> Ncells 1291111 69.0 1967602 105.1 1590760 85.0
> Vcells 11178604 85.3 22482701 171.6 21178631 161.6
>
> # made a ~75 Mb matrix
>
> > colnames(m) <- letters[1:10]
> > gc()
> used (Mb) gc trigger (Mb) max used (Mb)
> Ncells 1291149 69.0 1967602 105.1 1590760 85.0
> Vcells 11178679 85.3 22482701 171.6 21179851 161.6
> > se <- SummarizedExperiment(m)
> > gc()
> used (Mb) gc trigger (Mb) max used (Mb)
> Ncells 1302603 69.6 1967602 105.1 1623929 86.8
> Vcells 12189777 93.1 22482701 171.6 21179851 161.6
>
> # so far no copying
>
> > names(assays(se)) <- "counts"
> > gc()
> used (Mb) gc trigger (Mb) max used (Mb)
> Ncells 1303174 69.6 1967602 105.1 1623929 86.8
> Vcells 22190847 169.4 23686836 180.8 22203423 169.4
>
> # last step made a copy
>
>> sessionInfo()
> R Under development (unstable) (2014-05-07 r65539)
> Platform: x86_64-apple-darwin12.5.0 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets methods
> [8] base
>
> other attached packages:
> [1] GenomicRanges_1.17.12 GenomeInfoDb_1.1.3 IRanges_1.99.13
> [4] S4Vectors_0.0.6 BiocGenerics_0.11.2
>
> loaded via a namespace (and not attached):
> [1] RCurl_1.95-4.1 stats4_3.2.0 XVector_0.5.6
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioc-devel
mailing list