[Bioc-devel] change names(assays(SummarizedExperiment)) w/o copy?

Martin Morgan mtmorgan at fhcrc.org
Wed May 7 22:06:25 CEST 2014


On 05/07/2014 12:06 PM, Michael Love wrote:
> hi,
>
> Is there a way that I can change the names of the assays slot of a
> SummarizedExperiment, without making a new copy of the data contained
> within? Assume I get an SE which has already been constructed, but no
> names on the assays() SimpleList.

Hi Mike --

   names(assays(se)) = "counts"

extracts the assays from se, then applies the names to the SimpleList, then 
re-assigns the SimpleList to the SummarizedExperiment. The memory copy (of big 
data) is actually in the extraction assays(se)

 > m = matrix(0, 0, 0); tracemem(m)
[1] "<0x3449b4e8>"
 > se = SummarizedExperiment(m)
 > a = assays(se)
tracemem[0x3449b4e8 -> 0x34ef64f0]: lapply lapply lapply lapply endoapply 
endoapply assays assays

which can actually be avoided by asking for the assays without their dimnames

 > a = assays(se, withDimnames=FALSE)
 >

and from there

   names(a) = "counts"
   assays(se) = a

verifying that we haven't actually copied the matrix

 > .Internal(inspect(assays(se, withDimnames=FALSE)[[1]]))
@3449b4e8 14 REALSXP g0c0 [NAM(2),TR,ATT] (len=0, tl=0)
ATTRIB:
   @3449b4b0 02 LISTSXP g0c0 []
     TAG: @b9c778 01 SYMSXP g0c0 [LCK,gp=0x4000] "dim" (has value)
     @3449a118 13 INTSXP g0c1 [NAM(2)] (len=2, tl=0) 0,0
 > .Internal(inspect(m))
@3449b4e8 14 REALSXP g0c0 [NAM(2),TR,ATT] (len=0, tl=0)
ATTRIB:
   @3449b4b0 02 LISTSXP g0c0 []
     TAG: @b9c778 01 SYMSXP g0c0 [LCK,gp=0x4000] "dim" (has value)
     @3449a118 13 INTSXP g0c1 [NAM(2)] (len=2, tl=0) 0,0

One would hope (a) that I'd followed through on a previous promise to just apply 
the dimnames up-front, so that there is no need to use withDimnames=FALSE to 
avoid the copying (there might have been a price on the way in) and (b) that the 
following would work

   names(assays(se, withDimnames=FALSE)) = "counts"

it didn't

 > names(assays(se, withDimnames=FALSE)) = "counts"
Error in slot(x, nm) :
   no slot of name "withDimnames" for this object of class "SummarizedExperiment"

but does in 1.17.13

Martin

>
> thanks,
>
> Mike
>
>> library(GenomicRanges)
>   > gc()
>             used (Mb) gc trigger (Mb) max used (Mb)
>   Ncells 1291106   69    1710298 91.4  1590760 85.0
>   Vcells 1178619    9    1925843 14.7  1724123 13.2
>   > m <- matrix(1:2e7, ncol=10)
>   > gc()
>              used (Mb) gc trigger  (Mb) max used  (Mb)
>   Ncells  1291111 69.0    1967602 105.1  1590760  85.0
>   Vcells 11178604 85.3   22482701 171.6 21178631 161.6
>
> # made a ~75 Mb matrix
>
>   > colnames(m) <- letters[1:10]
>   > gc()
>              used (Mb) gc trigger  (Mb) max used  (Mb)
>   Ncells  1291149 69.0    1967602 105.1  1590760  85.0
>   Vcells 11178679 85.3   22482701 171.6 21179851 161.6
>   > se <- SummarizedExperiment(m)
>   > gc()
>              used (Mb) gc trigger  (Mb) max used  (Mb)
>   Ncells  1302603 69.6    1967602 105.1  1623929  86.8
>   Vcells 12189777 93.1   22482701 171.6 21179851 161.6
>
> # so far no copying
>
>   > names(assays(se)) <- "counts"
>   > gc()
>              used  (Mb) gc trigger  (Mb) max used  (Mb)
>   Ncells  1303174  69.6    1967602 105.1  1623929  86.8
>   Vcells 22190847 169.4   23686836 180.8 22203423 169.4
>
> # last step made a copy
>
>> sessionInfo()
>   R Under development (unstable) (2014-05-07 r65539)
>   Platform: x86_64-apple-darwin12.5.0 (64-bit)
>
>   locale:
>   [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
>   attached base packages:
>   [1] parallel  stats     graphics  grDevices utils     datasets  methods
>   [8] base
>
>   other attached packages:
>   [1] GenomicRanges_1.17.12 GenomeInfoDb_1.1.3    IRanges_1.99.13
>   [4] S4Vectors_0.0.6       BiocGenerics_0.11.2
>
>   loaded via a namespace (and not attached):
>   [1] RCurl_1.95-4.1 stats4_3.2.0   XVector_0.5.6
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioc-devel mailing list