[R] reverse array indexing
Richard A. O'Keefe
ok at cs.otago.ac.nz
Thu Jul 31 07:08:21 CEST 2003
Jerome Asselin <jerome at hivnet.ubc.ca> suggests this:
arr <- array(rnorm(27),c(3,3,3))
dimarr <- dim(arr)
tmparr <- array(1:prod(dimarr),dimarr)
sapply(c(3),function(x,tmparr) which(tmparr==x,T),tmparr=tmparr)
sapply(c(3,17,13,5),function(x,tmparr) which(tmparr==x,T),tmparr=tmparr)
Of course, in R we can simplify the last two lines to
sapply(<<argument goes here>>, function(x) which(tmparr==x,T))
However, wearing my "computer scientist" hat, I have to wonder about costs.
This is basically the equivalent of the APL "decode" operator.
Let's define
index.decode <- function (index, array) {
dimarr <- dim(arr)
tmparr <- array(1:prod(dimarr), dimarr)
sapply(index, function(x) which(tmparr == x, T))
}
The result is a matrix with C=length(index) columns
and R=length(dim(array)) rows. This has to take time O(R*C), because
the result occupies O(R*C) space and all of it has to be defined.
Now it is possible to implement index.decode so that it does take O(R*C)
time. Here's an outline, which I shan't bother to finish. (I'd do
ndims==4 and the general case if I were going to finish it. I'd also
have a drop= argument to handle the case where length(index)==1.)
index.decode <- function (index, array) {
jndex <- index - 1
dimarr <- dim(arr)
ndims <- length(dimarr)
if (ndims == 1) {
rbind(index)
} else
if (ndims == 2) {
rbind(jndex %% dimarr[1] + 1, jndex %/% dimarr[1] + 1)
} else
if (ndims == 3) {
rbind(jndex %% dimarr[1] + 1,
(jndex %/% dimarr[1]) %% dimarr[2] + 1,
jndex %/% (dimarr[1]*dimarr[2]) + 1)
} else {
stop("length(dims(array)) > 3 not yet implemented")
}
}
This is clearly O(R*C). What about the
sapply(index, function(x) which(tmparr==x, T))
approach?
tmparr is of size prod(dimarr); call that P. The expression tmparr==x
has to examine each element of tmparr, so that's O(P). This is done
for each element of index (C times), so the total is O(P*C).
Consider
mega <- array(1:1000000, c(100,100,100))
inxs <- as.integer(runif(10000, min=1, max=1000000))
Here C = length(inxs) = 10000, R = length(dim(mega)) = 3,
P = prod(dim(mega)) = 1000000. O(R*C) is looking *really* good
compared with O(P*C).
> system.time(index.decode(inxs, mega))
[1] 0.03 0.00 0.03 0.00 0.00
> system.time(slow.decode(inxs, mega))
[1] 3.51 0.79 4.33 0.00 0.00
Mind you, on a 500MHz UltraSPARC, I had to use big arrays to get any
measurable time at all...
More information about the R-help
mailing list