[R] expand.grid without expanding

Gabor Grothendieck ggrothendieck at gmail.com
Fri Feb 10 05:06:31 CET 2006


I have made a few more improvements:

expand.grid.id  <- function(id, ...) {
 vars <- list(...)
 nv <- length(vars)
 lims <- sapply(vars,length)
 stopifnot(length(lims) > 0, id <= prod(lims), length(names(vars)) == nv)
 res <- structure(vector("list",nv), .Names = names(vars))
 if (nv > 1) for(i in nv:2) {
   f <- prod(lims[1:(i-1)])
   res[[i]] <- vars[[i]][(id - 1)%/%f + 1]
   id <- (id - 1)%%f + 1
 }
 res[[1]] <- vars[[1]][id]
 as.data.frame(res)
}

# test
expand.grid(A = 1:2, B = letters[1:3])
expand.grid.id(1:6, A = 1:2, B = letters[1:3])


On 2/8/06, Ray Brownrigg <ray at mcs.vuw.ac.nz> wrote:
> > From: =?iso-8859-1?q?Lu=EDs_Torgo?= <ltorgo at liacc.up.pt>
> > Date: Wed, 8 Feb 2006 18:08:40 +0000
> >
> > Dear list,
> > I've recently came across a problem that I think I've solved and that I wanted
> > to share with you for two reasons:
> > - Maybe others come across the same problem.
> > - Maybe someone has a much simpler solution that wants to share with me ;-)
> >
> > The problem is as follows: expand.grid() allows you to generate a data.frame
> > with all combinations of a set of values, e.g.:
> > > expand.grid(par1=-1:1,par2=c('a','b'))
> >   par1 par2
> > 1   -1    a
> > 2    0    a
> > 3    1    a
> > 4   -1    b
> > 5    0    b
> > 6    1    b
> >
> > There is nothing wrong with this nice function except when you have too many
> > combinations to fit in your computer memory, and that was my problem: I
> > wanted to do something for each combination of a set of variants, but this
> > set was to large for storing in memory in a data.frame generated by
> > expand.grid. A possible solution would be to have a set of nested for()
> > cycles but I preferred a solution that involved a single for() cycle going
> > from 1 to the number of combinations and then at each iteration having some
> > form of generating the combination "i". And this was the "real problem": how
> > to generate a function that picks the same style of arguments as
> > expand.grid() and provides me with the values corresponding to line "i" of
> > the data frame that would have been created bu expand.grid(). For instance,
> > if I wanted the line 4 of the above call to expand.grid() I should get the
> > same as doing:
> > > expand.grid(par1=-1:1,par2=c('a','b'))[4,]
> >   par1 par2
> > 4   -1    b
> >
> > but obviously without having to use expand.grid() as that involves generating
> > a data frame that in my case wouldn't fit in the memory of my computer.
> >
> > Now, the function I've created was the following:
> > --------------------------------------------
> > getVariant <- function(id,vars) {
> >   if (!is.list(vars)) stop('vars needs to be a list!')
> >   nv <- length(vars)
> >   lims <- sapply(vars,length)
> >   if (id > prod(lims)) stop('id above the number of combinations!')
> >   res <- vector("list",nv)
> >   for(i in nv:2) {
> >     f <- prod(lims[1:(i-1)])
> >     res[[i]] <- vars[[i]][ceiling(id / f)]
> >     id <- id - (ceiling(id/f)-1)*f
> >   }
> >   res[[1]] <- vars[[1]][id]
> >   names(res) <- names(vars)
> >   res
> > }
> > --------------------------------------
> > > expand.grid(par1=-1:1,par2=c('a','b'))[4,]
> >   par1 par2
> > 4   -1    b
> > > getVariant(4,list(par1=-1:1,par2=c('a','b')))
> > $par1
> > [1] -1
> >
> > $par2
> > [1] "b"
> >
> > I would be glad to know if somebody came across the same problem and has a
> > better suggestion on how to solve this.
> >
> A few minor improvements:
> 1) let id be a vector of indices
> 2) use %% and %/% instead of ceiling (perhaps debateable)
> 3) return a data frame as does expand.grid
>
> So your function now looks like:
>
> getVariant <- function(id, vars) {
>  if (!is.list(vars)) stop('vars needs to be a list!')
>  nv <- length(vars)
>  lims <- sapply(vars, length)
>  if (any(id > prod(lims))) stop('id above the number of combinations!')
>  res <- vector("list", nv)
>  for(i in nv:2) {
>    f <- prod(lims[1:(i-1)])
>    res[[i]] <- vars[[i]][(id - 1)%/%f + 1]
>    id <- (id - 1)%%f + 1
>  }
>  res[[1]] <- vars[[1]][id]
>  names(res) <- names(vars)
>  return(as.data.frame(res))
> }
>
> Now, for example, you get:
>
> > expand.grid(par1=-1:1,par2=c('a','b'),par3=c('w','x','y','z'))[12:15,]
>   par1 par2 par3
> 12    1    b    x
> 13   -1    a    y
> 14    0    a    y
> 15    1    a    y
> > getVariant(12:15,list(par1=-1:1,par2=c('a','b'), par3=c('w','x','y','z')))
>  par1 par2 par3
> 1    1    b    x
> 2   -1    a    y
> 3    0    a    y
> 4    1    a    y
> >
>
> Note that you will run into trouble when the product of the lengths is
> greater than the largest representable integer on your system.
>
> Hope this helps,
> Ray Brownrigg
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>




More information about the R-help mailing list