[R] Memory hungry routines

Henrik Bengtsson hb at biostat.ucsf.edu
Tue Dec 30 01:29:12 CET 2014


On Mon, Dec 29, 2014 at 10:52 AM, ALBERTO VIEIRA FERREIRA MONTEIRO
<albmont at centroin.com.br> wrote:
> Is there any way to detect which calls are consuming memory?
>
> I run a program whose global variables take up about 50 Megabytes of
> memory, but when I monitor the progress of the program it seems to
> allocating 150 Megabytes of memory, with peaks of up to 2 Gigabytes.
>
> I know that the global variables aren't "copied" many times by the
> routines, but I suspect something weird must be happening.
>
> Alberto Monteiro
>
> PS: the lines, below, count the memory allocated to all global
> variables, probably it could be adapted to track the local variables:
>
> y <- ls(pat="")   # get all names of the variables
> z <- rep(0, length(y))  # create array of sizes
> for (i in 1:length(y)) z[i] <- object.size(get(y[i]))  # loop: get all
> sizes (in bytes) of the variables
> # BTW, is there any way to vectorialize the above loop?
> xix <- sort.int(z, index.return = TRUE)  # sort the sizes
> y <- y[xix$ix]  # apply the sort to the variables
> z <- z[xix$ix]  # apply the sort to the sizes
> y <- c(y, "total")  # add a totalizator
> z <- c(z, sum(z))  # sum them all
> cbind(y, z)  # ugly way to list them

Duncan already suggested Rprofmem().  For a neat interface to that,
see also lineprof package.

Common memory hogs are cbind(), rbind() and other ways of
incrementally building up objects.  These can often be avoided by
pre-allocating the final object up front and populating it as you go.
Another source of unnecessary memory duplication is coercion of data
types, e.g. allocating an integer matrix but populating it with
doubles.  A related mistake is to use matrix(nrow, ncol) for allocate
matrices that will hold numeric values.  That is actually doing
matrix(NA, nrow, ncol), which becomes a *logical* matrix, which will
be coerced (involving copying and large memory allocation) the first
thing as soon as it get's populated with a numeric value. One should
have used matrix(NA_real_, nrow, ncol) here.

For listing objects, their sizes and more, you can use ll() in the
R.oo package which returns a data.frame, e.g.

> example(iris)
> a <- 1:1e6
> R.oo::ll()
  member data.class dimension objectSize
1      a    numeric   1000000    4000040
2   dni3       list         3        600
3     ii data.frame  c(150,5)       7088
4   iris data.frame  c(150,5)       7088

> R.oo::ll(sortBy="objectSize")
  member data.class dimension objectSize
2   dni3       list         3        600
3     ii data.frame  c(150,5)       7088
4   iris data.frame  c(150,5)       7088
1      a    numeric   1000000    4000040

> tbl <- R.oo::ll()
> tbl <- tbl[order(tbl$objectSize, decreasing=TRUE),]
> tbl
  member data.class dimension objectSize
1      a    numeric   1000000    4000040
3     ii data.frame  c(150,5)       7088
4   iris data.frame  c(150,5)       7088
5   objs data.frame    c(4,4)       2760
2   dni3       list         3        600
> sum(tbl$objectSize)
[1] 4017576


/Henrik

>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list