[R] function pointers?

Wed Nov 22 18:38:26 CET 2017

On 22/11/2017 11:29 AM, Paul Johnson wrote:
> We have a project that calls for the creation of a list of many
> distribution objects.  Distributions can be of various types, with
> various parameters, but we ran into some problems. I started testing
> on a simple list of rnorm-based objects.
> 
> I was a little surprised at the RAM storage requirements, here's an example:
> 
> N <- 10000
> closureList <- vector("list", N)
> nsize = sample(x = 1:100, size = N, replace = TRUE)
> for (i in seq_along(nsize)){
>      closureList[[i]] <- list(func = rnorm, n = nsize[i])
> }
> format(object.size(closureList), units = "Mb")
> 
> Output says
> 22.4 MB
> 

You should read the help page for object.size.  You're doing exactly the 
kind of thing that causes it to give overestimates of the amount of 
memory being used.

I'd suggest turning on memory profiling in Rprof() for a more accurate 
result, but it seems to be broken:

 > Rprof(memory.profiling=TRUE)
 > N <- 10000
 > closureList <- vector("list", N)
 > nsize = sample(x = 1:100, size = N, replace = TRUE)
 > for (i in seq_along(nsize)){
+     closureList[[i]] <- list(func = rnorm, n = nsize[i])
+ }
 > format(object.size(closureList), units = "Mb")
[1] "19.2 Mb"
 > Rprof(NULL)
 > summaryRprof()
Error in rowsum.default(c(as.vector(new.ftable), fcounts), 
c(names(new.ftable),  :
   unimplemented type 'NULL' in 'HashTableSetup'
In addition: Warning message:
In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL'

Duncan Murdoch

> I noticed that if I do not name the objects in the list, then the
> storage drops to 19.9 MB.
> 
> That seemed like a lot of storage for a function's name. Why so much?
> My colleagues think the RAM use is high because this is a closure
> (hence closureList).  I can't even convince myself it actually is a
> closure. The R source has
> 
> rnorm <- function(n, mean=0, sd=1) .Call(C_rnorm, n, mean, sd)
> 
> The storage holding 10000 copies of rnorm, but we really only need 1,
> which we can use in the objects.
> 
> Thinking of this like C,  I am looking to pass in a pointer to the
> function.  I found my way to the idea of putting a function in an
> environment in order to pass it by reference:
> 
> rnormPointer <- function(inputValue1, inputValue2){
>      object <- new.env(parent=globalenv())
>      object$distr <- inputValue1
>      object$n <- inputValue2
>      class(object) <- 'pointer'
>      object
> }
> 
> ## Experiment with that
> gg <- rnormPointer(rnorm, 33)
> gg$distr(gg$n)
> 
> ptrList <- vector("list", N)
> for(i in seq_along(nsize)) {
>      ptrList[[i]] <- rnormPointer(rnorm, nsize[i])
> }
> format(object.size(ptrList), units = "Mb")
> 
> The required storage is reduced to 2.6 Mb. Thats 1/10 of the RAM
> required for closureList.  This thing works in the way I expect
> 
> ## can pass in the unnamed arguments for n, mean and sd here
> ptrList[[1]]$distr(33, 100, 10)
> ## Or the named arguments
> ptrList[[1]]$distr(1, sd = 100)
> 
> This environment trick mostly works, so far as I can see, but I have
> these questions.
> 
> 1. Is the object.size() return accurate for ptrList?  Do I really
> reduce storage to that amount, or is the required storage someplace
> else (in the new environment) that is not included in object.size()?
> 
> 2. Am I running with scissors here? Unexpected bad things await?
> 
> 3. Why is the storage for closureList so great? It looks to me like
> rnorm is just this little thing:
> 
> function (n, mean = 0, sd = 1)
> .Call(C_rnorm, n, mean, sd)
> <bytecode: 0x55cc9988cae0>
> 
> 4. Could I learn (you show me?) to store the bytecode address as a
> thing and use it in the objects?  I'd guess that is the fastest
> possible way. In an Objective-C problem in the olden days, we found
> the method-lookup was a major slowdown and one of the programmers
> showed us how to save the lookup and use it over and over.
> 
> pj
> 
> 
>