# [R] reverse array indexing

Richard A. O'Keefe ok at cs.otago.ac.nz
Thu Jul 31 07:08:21 CEST 2003

```Jerome Asselin <jerome at hivnet.ubc.ca> suggests this:
arr <- array(rnorm(27),c(3,3,3))
dimarr <- dim(arr)
tmparr <- array(1:prod(dimarr),dimarr)
sapply(c(3),function(x,tmparr) which(tmparr==x,T),tmparr=tmparr)
sapply(c(3,17,13,5),function(x,tmparr) which(tmparr==x,T),tmparr=tmparr)

Of course, in R we can simplify the last two lines to
sapply(<<argument goes here>>, function(x) which(tmparr==x,T))

However, wearing my "computer scientist" hat, I have to wonder about costs.
This is basically the equivalent of the APL "decode" operator.

Let's define

index.decode <- function (index, array) {
dimarr <- dim(arr)
tmparr <- array(1:prod(dimarr), dimarr)
sapply(index, function(x) which(tmparr == x, T))
}

The result is a matrix with C=length(index) columns
and R=length(dim(array)) rows.  This has to take time O(R*C), because
the result occupies O(R*C) space and all of it has to be defined.

Now it is possible to implement index.decode so that it does take O(R*C)
time.   Here's an outline, which I shan't bother to finish.  (I'd do
ndims==4 and the general case if I were going to finish it.  I'd also
have a drop= argument to handle the case where length(index)==1.)

index.decode <- function (index, array) {
jndex <- index - 1
dimarr <- dim(arr)
ndims <- length(dimarr)
if (ndims == 1) {
rbind(index)
} else
if (ndims == 2) {
rbind(jndex %% dimarr + 1, jndex %/% dimarr + 1)
} else
if (ndims == 3) {
rbind(jndex %% dimarr + 1,
(jndex %/% dimarr) %% dimarr + 1,
jndex %/% (dimarr*dimarr) + 1)
} else {
stop("length(dims(array)) > 3 not yet implemented")
}
}

This is clearly O(R*C).  What about the
sapply(index, function(x) which(tmparr==x, T))
approach?

tmparr is of size prod(dimarr); call that P.  The expression tmparr==x
has to examine each element of tmparr, so that's O(P).  This is done
for each element of index (C times), so the total is O(P*C).

Consider

mega <- array(1:1000000, c(100,100,100))
inxs <- as.integer(runif(10000, min=1, max=1000000))

Here C = length(inxs) = 10000, R = length(dim(mega)) = 3,
P = prod(dim(mega)) = 1000000.  O(R*C) is looking *really* good
compared with O(P*C).

> system.time(index.decode(inxs, mega))
 0.03 0.00 0.03 0.00 0.00
> system.time(slow.decode(inxs, mega))
 3.51 0.79 4.33 0.00 0.00

Mind you, on a 500MHz UltraSPARC, I had to use big arrays to get any
measurable time at all...

```