[R] Persistent state in a function?

Boris Steipe boris.steipe at utoronto.ca
Fri Mar 25 19:35:36 CET 2016


To follow on, for the record, here's the code example using local() instead:
Comments appreciated.

# ======= CODE
cacheThis <- local({   # creates a closure, assigned to "cacheThis"

   myCache <- numeric()            # a persistent variable in the closure's environment
   useCache <- function(x){        # a function 
       myCache <<- c(myCache, x)
       print(myCache)
   }
})

# ======= /CODE
Using the function and accessing the local variable if need be is the same as below.

Pros and Cons:
It seems to me that the difference between the two approaches is only:
- using local(), I need to duplicate the code if I want more than one instance of the closure. The code seems more explicit however in that the contents of the local environment is clearly spelled out.
- using makeCache() in my earlier post, I don't duplicate the code. I see why Martin calls this a 'factory' function. I return() the function that does the work, and its local environment plus local variables implicitly come along. It's a bit less explicit now, because the definition of makeCache() could be far away in the code.

Other?

Terminology:
I would say
- cacheThis() is a closure in the global environment.
- useCache() is a function in that closure.
- myCache is a variable in useCache()'s local environment.
I have previously referred to the "local environment" as being "private". This informal use of the word "private" may be confusing since it means something else in other languages.

Encapsulation:
The obvious purpose of this is to encapsulate the persistent variable to prevent it from getting corrupted. So it would be nice to have getter and setter functions. In memoise, Hadley makes these elements of a list which I find a neat idea.


# ======= CODE: adding additional functions

cacheThis <- local({   

    myCache <- numeric()  
    functions <- list(
        calc = function(x){        
            myCache <<- c(myCache, x)
            print(myCache)
        },
        get = function() {
            return(myCache)
        },
        set = function(x) {
            myCache <<- x 
        }
    )
})

cacheThis$calc(17)   # 17
cacheThis$calc(13)   # 17 13
cacheThis$calc(10)   # 17 13 10

cacheThis$get()      # 17 13 10

cacheThis$set(c(17, 13, 11)) 

cacheThis$calc(7)    # 17 13 11 7

# =======  /CODE

Cheers,
Boris


On Mar 23, 2016, at 6:41 PM, Boris Steipe <boris.steipe at utoronto.ca> wrote:

> All -
> Thanks, this has been a real eye-opener.
> 
> Here's my variation based on what I've learned so far. It's based on Bert's earlier function-returning-a-closure example. I hope I got the terminology right.
> 
> # ========================================================
> 
> makeCache <- function(){   # returns a "closure",
> 	                   # i.e. a function
> 	                   # plus its private, lexically
> 	                   # scoped environment
> 
>   myCache <- numeric()  # a variable that we want to persist;
>                         # makeCache() creates the 
>                         # environment that holds myCache and
>                         # the function useCache() that uses myCache
> 
>   useCache <- function(x){
> 	  myCache <<- c(myCache, x)  # appends a value to myCache
> 	                             # <<- does _not_ assign to the
> 	                             # global environment, but searches
> 	                             # through the parent environments
> 	                             # and assigns to the global environment
> 	                             # only if no match was found along
> 	                             # the way.
> 	  print(myCache)
>   }
> 
>   return(useCache)     # return the function plus its environment
> }
> 
> # ======= creating instances of the closure and using them
> 
> cacheThis <- makeCache() # cacheThis is the closure that was created
>                         # by makeCache
> 
> cacheThis(17)  # 17
> cacheThis(13)  # 17 13
> cacheThis(11)  # 17 13 11
> 
> 
> cacheThat <- makeCache() # create another closure
> 
> cacheThat(1)  # 1
> cacheThat(2)  # 1 2
> cacheThat(3)  # 1 2 3
> cacheThat(5)  # 1 2 3 5
> 
> # ======= accessing the private variables
> 
> # The caches for cacheThis() and cacheThat() are not visible
> # from the (default) global environment:
> ls()  # [1] "cacheThat" "cacheThis" "makeCache" 
> 
> # To access them from the global environment, use
> # ls(), exists(), get() and assign(), with their environment
> # argument:
> 
> ls.str(envir = environment(cacheThis))
> 
> ls.str(envir = environment(cacheThat))
> 
> exists("myCache", envir = environment(cacheThat))
> exists("noSuchThing", envir = environment(cacheThat))
> 
> # The following won't work - save() needs a name as symbol or string:
> save(get("myCache", envir = environment(cacheThis)), file="myCache.Rdata")
> 
> # do this instead:
> tmp <- get("myCache", envir = environment(cacheThis))
> save(tmp, file="myCache.Rdata")
> rm(tmp)
> 
> # add a number we don't want...
> cacheThis(6) # 17 13 11 6
> 
> # restore cache from saved version
> load("myCache.Rdata") # this recreates "tmp"
> assign("myCache", tmp, envir = environment(cacheThis))
> 
> # cache another prime ...
> cacheThis(7) # 17 13 11 7
> 
> # etc.
> 
> # ========================================================
> 
> I don't yet understand the pros and cons of using local() instead of a generating function. From my current understanding, local() should end up doing the same thing - I think that's why Martin calls one a "variant" of the other. But I'll play some more with this later today. Is there a Preferred Way?
> 
> memoise has some nice ideas - such as creating a hash from the arguments passed into a function to see if the cached results need to be recomputed. In my use case, I'd like to have more explicit access to the cached results to be able to store, reload and otherwise manipulate them.
> 
> I haven't looked at R6 yet.
> 
> Cheers,
> Boris
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On Mar 23, 2016, at 5:58 PM, Martin Morgan <martin.morgan at roswellpark.org> wrote:
> 
>> Use a local environment to as a place to store state. Update with <<- and resolve symbol references through lexical scope E.g.,
>> 
>>   persist <- local({
>>       last <- NULL                # initialize
>>       function(value) {
>>           if (!missing(value))
>>               last <<- value      # update with <<-
>>           last                    # use
>>       }
>>   })
>> 
>> and in action
>> 
>>> persist("foo")
>> [1] "foo"
>>> persist()
>> [1] "foo"
>>> persist("bar")
>> [1] "bar"
>>> persist()
>> [1] "bar"
>> 
>> A variant is to use a 'factory' function
>> 
>>   factory <- function(init) {
>>       stopifnot(!missing(init))
>>       last <- init
>>       function(value) {
>>           if (!missing(value))
>>               last <<- value
>>           last
>>       }
>>   }
>> 
>> and
>> 
>>> p1 = factory("foo")
>>> p2 = factory("bar")
>>> c(p1(), p2())
>> [1] "foo" "bar"
>>> c(p1(), p2("foo"))
>> [1] "foo" "foo"
>>> c(p1(), p2())
>> [1] "foo" "foo"
>> 
>> The 'bank account' exercise in section 10.7 of RShowDoc("R-intro") illustrates this.
>> 
>> Martin
>> 
>> On 03/19/2016 12:45 PM, Boris Steipe wrote:
>>> Dear all -
>>> 
>>> I need to have a function maintain a persistent lookup table of results for an expensive calculation, a named vector or hash. I know that I can just keep the table in the global environment. One problem with this approach is that the function should be able to delete/recalculate the table and I don't like side-effects in the global environment. This table really should be private. What I don't know is:
>>> -A- how can I keep the table in an environment that is private to the function but persistent for the session?
>>> -B- how can I store and reload such table?
>>> -C- most importantly: is that the right strategy to initialize and maintain state in a function in the first place?
>>> 
>>> 
>>> For illustration ...
>>> 
>>> -----------------------------------
>>> 
>>> myDist <- function(a, b) {
>>>    # retrieve or calculate distances
>>>    if (!exists("Vals")) {
>>>        Vals <<- numeric() # the lookup table for distance values
>>>                           # here, created in the global env.
>>>    }
>>>    key <- sprintf("X%d.%d", a, b)
>>>    thisDist <- Vals[key]
>>>    if (is.na(thisDist)) {          # Hasn't been calculated yet ...
>>>        cat("Calculating ... ")
>>>        thisDist <- sqrt(a^2 + b^2) # calculate with some expensive function ...
>>>        Vals[key] <<- thisDist      # store in global table
>>>    }
>>>    return(thisDist)
>>> }
>>> 
>>> 
>>> # run this
>>> set.seed(112358)
>>> 
>>> for (i in 1:10) {
>>>    x <- sample(1:3, 2)
>>>    print(sprintf("d(%d, %d) = %f", x[1], x[2], myDist(x[1], x[2])))
>>> }
>>> 
>>> 
>>> Thanks!
>>> Boris
>>> 
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>> 
>> 
>> 
>> This email message may contain legally privileged and/or confidential information.  If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited.  If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list