[Rd] arraytake for extracting subarrays from multidimensional arrays

Thu Oct 19 15:40:59 CEST 2006

On 19 Oct 2006, at 14:26, Gabor Grothendieck wrote:

> Note that it can also be done like with do.call:
>
> a <- array(1:24, 2:4)
> L <- list(TRUE, 1:3, c(4, 2))
> do.call("[", c(list(a), L))
>

aargggh, you beat me to it.  I didn't think  to pass TRUE to  "[" .

I'll stick it in the package with joint attribution to Gabor and Balaji
and document it with apltake() and apldrop().

best wishes all

rksh

> On 10/19/06, Balaji S. Srinivasan <balajis at stanford.edu> wrote:
>> Hi,
>>
>> I recently encountered a problem with array subsetting and came up  
>> with a
>> fix. Given an array of arbitrary dimensions, in which the number of
>> dimensions is only known at runtime, I wanted to extract a  
>> subarray. The
>> main issue with doing this is that in order to extract a subarray  
>> from an
>> array of (say) 4 dimensions you usually specify something like this
>>
>> a.subarray <- a[,c(4,2),1:5,]
>>
>> However, if your code needs to handle an array with an arbitrary  
>> number of
>> dimensions then you can't hard code the number of commas while  
>> writing the
>> code. (Regarding motivation, the reason this came up is because I  
>> wanted to
>> do some toy problems involving conditioning on multiple variables  
>> in a
>> multidimensional joint pmf.)
>>
>> I looked through commands like slice.index and so on, but they  
>> seemed to
>> require reshaping and big logical matrix intermediates, which were  
>> not
>> memory efficient enough for me. apltake in the magic package was  
>> the closest
>> but it only allowed subsetting of contiguous indices from either  
>> the first
>> or last element in any given dimension. It was certainly possible  
>> to call
>> apltake multiple times to extract arbitrary subarrays via  
>> combinations of
>> index intervals for each dimension, and then combine them with  
>> abind as
>> necessary, but this did not seem elegant.
>>
>> Anyway, I then decided to simply generate code with parse and  
>> eval. I found
>> this post by Henrik Bengtsson which had the same idea:
>>
>> http://tolstoy.newcastle.edu.au/R/devel/05/11/3266.html
>>
>> I just took that code one step further and put together a utility  
>> function
>> that I think might be fairly useful. I haven't completely  
>> robustified it
>> against all kinds of pathological inputs, but if there is any  
>> interest from
>> the development team it would be nice to add an error-checked  
>> version of
>> this to R (or I guess I could keep it in a package).
>>
>>
>> Simple usage example:
>> ------
>>> source("arraytake.R")
>>> a <- array(1:24,c(2,3,4))
>>
>>> a[,1:3,c(4,2)] ##This invocation requires hard coding the number of
>> dimensions of a
>> , , 1
>>
>>     [,1] [,2] [,3]
>> [1,]   19   21   23
>> [2,]   20   22   24
>>
>> , , 2
>>
>>     [,1] [,2] [,3]
>> [1,]    7    9   11
>> [2,]    8   10   12
>>
>>
>>> arraytake(a,list(NULL,1:3,c(4,2))) ##This invocation does not, and
>> produces the same result
>> , , 1
>>
>>     [,1] [,2] [,3]
>> [1,]   19   21   23
>> [2,]   20   22   24
>>
>> , , 2
>>
>>     [,1] [,2] [,3]
>> [1,]    7    9   11
>> [2,]    8   10   12
>>
>>
>>
>> Code below:
>> --------
>> arraytake <- function(x,indlist) {
>>
>>  #Returns subarrays of arbitrary dimensioned arrays
>>  #1) Let x be a multidimensional array with an arbitrary number of
>> dimensions.
>>  #2) Let indlist be a list of vectors. The length of indlist is  
>> the same as
>> the number of
>>  #dimensions in x. Each element of the indlist is a vector which  
>> specifies
>> which
>>  #indexes to extract in the corresponding dimension. If the  
>> element of the
>> indlist is
>>  #NULL, then we return all elements in that dimension.
>>
>>  #The main way this works is by programmatically building up a comma
>> separated argument to "[" as a string
>>  #and then simply evaluating that expression. This way one does  
>> not need to
>> specify the number of
>>  #commas.
>>
>>  if(length(dim(x)) != length(indlist)) {
>>    return();  #we would put some error message here in production  
>> code
>>  }
>>
>>  #First build up a string w/ indices for each dimension
>>  d <- length(indlist);  #number of dims
>>  indvecstr <- matrix(0,d,1);
>>  for(i in 1:d) {
>>    if(is.null(indlist[[i]])) {
>>      indvecstr[i] <- "";
>>    } else{
>>      indvecstr[i] <-
>> paste("c(",paste(indlist[[i]],sep="",collapse=","),")",sep="")
>>    }
>>  }
>>
>>  #Then build up the argument string to "["
>>  argstr <- paste(indvecstr,sep="",collapse=",")
>>  argstr <- paste("x[",argstr,"]",sep="")
>>
>>  #Finally, return the subsetted array
>>  return(eval(parse(text=argstr)))
>> }
>>
>>
>>
>>
>>
>>
>>
>> --
>> Dr. Balaji S. Srinivasan
>> Stanford University
>> Depts. of Statistics and Computer Science
>> 318 Campus Drive, Clark Center S251
>> (650) 380-0695
>> balajis at stanford.edu
>> http://jinome.stanford.edu
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

--
Robin Hankin
Uncertainty Analyst
National Oceanography Centre, Southampton
European Way, Southampton SO14 3ZH, UK
  tel  023-8059-7743