[R] function pointers?
Paul Johnson
pauljohn32 at gmail.com
Wed Nov 22 17:29:58 CET 2017
We have a project that calls for the creation of a list of many
distribution objects. Distributions can be of various types, with
various parameters, but we ran into some problems. I started testing
on a simple list of rnorm-based objects.
I was a little surprised at the RAM storage requirements, here's an example:
N <- 10000
closureList <- vector("list", N)
nsize = sample(x = 1:100, size = N, replace = TRUE)
for (i in seq_along(nsize)){
closureList[[i]] <- list(func = rnorm, n = nsize[i])
}
format(object.size(closureList), units = "Mb")
Output says
22.4 MB
I noticed that if I do not name the objects in the list, then the
storage drops to 19.9 MB.
That seemed like a lot of storage for a function's name. Why so much?
My colleagues think the RAM use is high because this is a closure
(hence closureList). I can't even convince myself it actually is a
closure. The R source has
rnorm <- function(n, mean=0, sd=1) .Call(C_rnorm, n, mean, sd)
The storage holding 10000 copies of rnorm, but we really only need 1,
which we can use in the objects.
Thinking of this like C, I am looking to pass in a pointer to the
function. I found my way to the idea of putting a function in an
environment in order to pass it by reference:
rnormPointer <- function(inputValue1, inputValue2){
object <- new.env(parent=globalenv())
object$distr <- inputValue1
object$n <- inputValue2
class(object) <- 'pointer'
object
}
## Experiment with that
gg <- rnormPointer(rnorm, 33)
gg$distr(gg$n)
ptrList <- vector("list", N)
for(i in seq_along(nsize)) {
ptrList[[i]] <- rnormPointer(rnorm, nsize[i])
}
format(object.size(ptrList), units = "Mb")
The required storage is reduced to 2.6 Mb. Thats 1/10 of the RAM
required for closureList. This thing works in the way I expect
## can pass in the unnamed arguments for n, mean and sd here
ptrList[[1]]$distr(33, 100, 10)
## Or the named arguments
ptrList[[1]]$distr(1, sd = 100)
This environment trick mostly works, so far as I can see, but I have
these questions.
1. Is the object.size() return accurate for ptrList? Do I really
reduce storage to that amount, or is the required storage someplace
else (in the new environment) that is not included in object.size()?
2. Am I running with scissors here? Unexpected bad things await?
3. Why is the storage for closureList so great? It looks to me like
rnorm is just this little thing:
function (n, mean = 0, sd = 1)
.Call(C_rnorm, n, mean, sd)
<bytecode: 0x55cc9988cae0>
4. Could I learn (you show me?) to store the bytecode address as a
thing and use it in the objects? I'd guess that is the fastest
possible way. In an Objective-C problem in the olden days, we found
the method-lookup was a major slowdown and one of the programmers
showed us how to save the lookup and use it over and over.
pj
--
Paul E. Johnson http://pj.freefaculty.org
Director, Center for Research Methods and Data Analysis http://crmda.ku.edu
To write to me directly, please address me at pauljohn at ku.edu.
More information about the R-help
mailing list