[Bioc-devel] tip: memory profiling using lineprof
Martin Morgan
mtmorgan at fhcrc.org
Mon Feb 10 08:17:45 CET 2014
On 02/09/2014 02:38 PM, Kasper Daniel Hansen wrote:
> Memory usage is a common bottleneck.
>
> For people interested in profiling their memory usage I want to recommend
> the lineprof package by Hadley Wickham which I have had great success with
> so far. There is some details in his 'Advanced R programming' at
> http://adv-r.had.co.nz/memory.html
> I see this package as a real game changer.
>
> I have written an example debugging session on a real use case
> (minfi::preprocessRaw) at
> http://www.hansenlab.org/rstats/2014/01/30/lineprof/
> where I end up having to workaround using new() for Biobase classes (an
> eSet derived class in minfi)
Thanks Kasper for the pointer. This is a bit brutal
> m <- matrix(0, 0, 0)
> tracemem(m)
[1] "<0xe048d80>"
> ExpressionSet(m)
tracemem[0xe048d80 -> 0xeb10530]: eapply sampleNames<- sampleNames<- .local
.nextMethod eval eval callNextMethod .local initialize initialize new
.ExpressionSet ExpressionSet ExpressionSet
... 15 copies later...
tracemem[0xf93b2b8 -> 0xf93bc00]: colnames<- sampleNames<- sampleNames<-
.harmonizeDimnames .local initialize initialize new .ExpressionSet ExpressionSet
ExpressionSet
Much of this is avoidable... copyEnv(), eapply(), and rownames<-, used when
making row and column names of the assayData consistent with feature and sample
names, all seem to unnecessarily duplicated elements
e <- new.env(); m <- matrix(1); tracemem(m)
## [1] "<0x1810d650>"
e[["m"]] <- m
x <- copyEnv(e)
## tracemem[0x1810d650 -> 0x1810e0d8]: .Call copyEnv
x <- eapply(e, dim)
## tracemem[0x1810d650 -> 0x1810e9f8]: eapply
dimnames(e[["m"]]) <- list("a", "A")
## tracemem[0x1810d650 -> 0x1810fab0]:
rownames(e[["m"]]) <- "a"
## tracemem[0x1810fab0 -> 0x18110de8]:
## tracemem[0x18110de8 -> 0x18111730]: rownames<-
I've updated the C code for copyEnv in Biobase, and avoided eapply and
row/colnames, so that there are usually only one or two copies for the simplest
constructor. I'll look out for bugs in downstream packages, and would be happy
to hear of other easily reproducible examples of apparently unnecessary duplication.
Martin
>
> Best,
> Kasper
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioc-devel
mailing list