[R] Zero Index Origin?
Richard A. O'Keefe
ok at cs.otago.ac.nz
Fri Apr 2 02:13:04 CEST 2004
I asked where index origin 0 would help.
(I am familiar with Dijkstra's article on why counting should start
at 0 and agree whole-heartedly. I am also convinced that "kicking
against the goads" is not helpful.)
Peter Wolf <s-plus at wiwi.uni-bielefeld.de>
has provided two examples.
(1) Converting pseudo-code which assumes index origin 0.
I've done this myself, although not yet in R. I've also done it
the other way, converting index origin 1 code to C.
A good method, I've found, is to to start by converting to
index-free form. This applies whatever the source and target
languages are.
(2) A sorting algorithm.
sort.6 <- function (a) {
n <- length(a)
adapt <- function (i) {i+1}
a <- c(0,a)
for (i in 2:n) {
j <- i-1
a[adapt(0)] <- a[adapt(i)]
while (a[adapt(j)] > a[adapt(0)]) {
a[adapt(j+1)] <- a[adapt(j)]
j <- j-1
}
a[adapt(j+1)] <- a[adapt(0)]
}
a[-1]
}
The really interesting thing here is that this basically is an
index origin 1 algorithm. The original array and the final result
start at 1, not 0, and position 0 is used for a "sentinel".
Let's convert it to 1-origin. I'll skip the details of how I
did it because that's not the point I want to make.
sort.VI <- function (a) {
a <- c(0, a)
for (i in 3:length(a)) {
j <- i-1
a[1] <- a[i]
while (a[j] > a[1]) {
a[j+1] <- a[j]
j <- j-1
}
a[j+1] <- a[1]
}
a[-1]
}
What do you get if you move up to index-free form?
sort.six <- function (a) {
s <- c()
for (x in a) {
f <- s <= x
s <- c(s[f], x, s[!f]) # insert x stably into s
}
s
}
It's clear that sort.six is shorter, clearer, and easier to get
right than sort.VI. But how much do we have to pay for this?
How much efficiency do we lose?
> a <- runif(400)
> system.time(sort.VI(a))
[1] 3.64 0.02 12.56 0.00 0.00
> system.time(sort.six(a))
[1] 0.15 0.01 0.16 0.00 0.00
We don't lose any efficiency at all. We gain, considerably.
(Not as much as we'd gain by using the built-in sort(), of course.)
I expect that this will happen fairly often: rewriting the code to be
index-free will *usually* make it shorter and clearer, and will *always*
make it easier to adapt to a language with a different index origin.
When the target language is R or S, the result is likely to be faster
than a direct conversion.
More information about the R-help
mailing list