[R] functions for subsets: a minor R programming challenge

Douglas Bates bates at stat.wisc.edu
Sun Dec 17 18:03:08 CET 2000


Bill.Venables at CMIS.CSIRO.AU writes:

> There is, however, a serious catch.  The second and third functions are
> S-PLUS FUNCTIONS ONLY because they make essential use of frame 0 or frame 1,
> entities that R quite righly eschews as mad, bad and dangerous to have.  So
> the challenge is out for someone to write versions of them that use R
> "environments" to achieve the same effect but in an elegant and illumating
> way.  The second function uses frame 1 (or frame 0) for caching partial
> results that can be later re-used and the third uses frame 0 for storing the
> current subset.  I know it is possible since Doug Bates has done so already,
> but I have lost his code.  

You lost my code!  I'm sure that's never happened to me :-)

I'll respond to the challenge of creating an R version of the second
function having had the benefit of your explaining how it works during
our conversations.

For the benefit of others reading this, in the Subsets function Bill
uses frame 1 to store intermediate results and avoid recalculating
them.  They are stored in frame 1 so they do not conflict with other
objects in the global databases.  The names constructed as the strings
i1 and i2 are unlikely to occur in practice but you still want to
avoid the possibility of conflict and frame 1 gives you a private
namespace during the execution of the uppermost call of the function
Subsets.  (One top-level call of Subsets will generate many other
calls of Subset through Recall.)

One approach to obtaining a private namespace in R is to use an
environment.  An R function inherits the environment in which it is
defined.  Most of the time this is the global environment.  Exceptions
are when a function is defined within another function so the first R
version of the code looks like

mkSubsets <- function()                 # works in R only
{
    .env <- environment()
    function(n, r, v = 1:n)
        if(r <= 0) vector(mode(v), 0)
        else if(r >= n) v[1:n]
        else {
            if(r > 1) {
                i1 <- paste("#", n - 1, r - 1)
                i2 <- paste("#", n - 1, r)
                if(!exists(i1))
                    assign(i1,
                           as.vector(Recall(n - 1, r - 1, 1:(n - 1))),
                           env = .env)
                if(!exists(i2))
                    assign(i2,
                           as.vector(Recall(n - 1, r, 1:(n - 1))),
                           env = .env)
                rbind(cbind(v[1],
                            matrix(v[-1][get(i1)], ncol=r-1)),
                      matrix(v[-1][get(i2)], ncol=r))
            } else matrix(v[1:n], ncol = 1)
        }
}

Subsets <- mkSubsets()

The function mkSubsets exists only to provide a private environment
(stored under the name .env) for the definition of the function
inside, which is returned as the value of mkSubsets.  That's the
function that I assign as Subsets.  Notice that the frame argument in
the calls to Recall has been replaced by env = .env so all of these
assignments are taking place in the environment in which subsets was
defined.

If I print the Subsets function it reproduces the definition and adds

<environment: 0x8a758ec>    # this number may vary

After calling Subsets, there are several objects with those funny
names in the environment of the Subsets function
> Subsets(5,3)
      [,1] [,2] [,3]
 [1,]    1    2    3
 [2,]    1    2    4
 [3,]    1    2    5
 [4,]    1    3    4
 [5,]    1    3    5
 [6,]    1    4    5
 [7,]    2    3    4
 [8,]    2    3    5
 [9,]    2    4    5
[10,]    3    4    5
> objects(env = environment(Subsets))
[1] "# 2 1" "# 2 2" "# 3 1" "# 3 2" "# 3 3" "# 4 2" "# 4 3"

That is one of the differences between the S version and the R
version.  The frame 1 used in S disappears after the function returns
whereas the environment for the R function persists until the end of
the session.  The R function will get faster with each execution
because the intermediate results are still in the cache but it does
this at the expense of taking up more memory.

Because the function mkSubsets is used only to create the function
Subsets, the whole thing could be changed into an anonymous call.

Subsets <- (function()                 # works in R only
{
    .env <- environment()
    function(n, r, v = 1:n)
        if(r <= 0) vector(mode(v), 0)
        else if(r >= n) v[1:n]
        else {
            if(r > 1) {
                i1 <- paste("#", n - 1, r - 1)
                i2 <- paste("#", n - 1, r)
                if(!exists(i1))
                    assign(i1,
                           as.vector(Recall(n - 1, r - 1, 1:(n - 1))),
                           env = .env)
                if(!exists(i2))
                    assign(i2,
                           as.vector(Recall(n - 1, r, 1:(n - 1))),
                           env = .env)
                rbind(cbind(v[1],
                            matrix(v[-1][get(i1)], ncol=r-1)),
                      matrix(v[-1][get(i2)], ncol=r))
            } else matrix(v[1:n], ncol = 1)
        }
})()

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list