[Bioc-devel] tip: memory profiling using lineprof

Martin Morgan mtmorgan at fhcrc.org
Mon Feb 10 08:17:45 CET 2014


On 02/09/2014 02:38 PM, Kasper Daniel Hansen wrote:
> Memory usage is a common bottleneck.
>
> For people interested in profiling their memory usage I want to recommend
> the lineprof package by Hadley Wickham which I have had great success with
> so far.  There is some details in his 'Advanced R programming' at
>    http://adv-r.had.co.nz/memory.html
> I see this package as a real game changer.
>
> I have written an example debugging session on a real use case
> (minfi::preprocessRaw) at
>    http://www.hansenlab.org/rstats/2014/01/30/lineprof/
> where I end up having to workaround using new() for Biobase classes (an
> eSet derived class in minfi)

Thanks Kasper for the pointer.  This is a bit brutal

 > m <- matrix(0, 0, 0)
 > tracemem(m)
[1] "<0xe048d80>"
 > ExpressionSet(m)

tracemem[0xe048d80 -> 0xeb10530]: eapply sampleNames<- sampleNames<- .local 
.nextMethod eval eval callNextMethod .local initialize initialize new 
.ExpressionSet ExpressionSet ExpressionSet

... 15 copies later...

tracemem[0xf93b2b8 -> 0xf93bc00]: colnames<- sampleNames<- sampleNames<- 
.harmonizeDimnames .local initialize initialize new .ExpressionSet ExpressionSet 
ExpressionSet

Much of this is avoidable... copyEnv(), eapply(), and rownames<-, used when 
making row and column names of the assayData consistent with feature and sample 
names, all seem to unnecessarily duplicated elements

     e <- new.env(); m <- matrix(1); tracemem(m)
     ## [1] "<0x1810d650>"
     e[["m"]] <- m
     x <- copyEnv(e)
     ## tracemem[0x1810d650 -> 0x1810e0d8]: .Call copyEnv
     x <- eapply(e, dim)
     ## tracemem[0x1810d650 -> 0x1810e9f8]: eapply
     dimnames(e[["m"]]) <- list("a", "A")
     ## tracemem[0x1810d650 -> 0x1810fab0]:
     rownames(e[["m"]]) <- "a"
     ## tracemem[0x1810fab0 -> 0x18110de8]:
     ## tracemem[0x18110de8 -> 0x18111730]: rownames<-

I've updated the C code for copyEnv in Biobase, and avoided eapply and 
row/colnames, so that there are usually only one or two copies for the simplest 
constructor. I'll look out for bugs in downstream packages, and would be happy 
to hear of other easily reproducible examples of apparently unnecessary duplication.

Martin

>
> Best,
> Kasper
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioc-devel mailing list