[R] Zero Index Origin?

Fri Apr 2 02:13:04 CEST 2004

I asked where index origin 0 would help.
(I am familiar with Dijkstra's article on why counting should start
at 0 and agree whole-heartedly.  I am also convinced that "kicking
against the goads" is not helpful.)

Peter Wolf <s-plus at wiwi.uni-bielefeld.de>
has provided two examples.

(1) Converting pseudo-code which assumes index origin 0.

    I've done this myself, although not yet in R.  I've also done it
    the other way, converting index origin 1 code to C.

    A good method, I've found, is to to start by converting to
    index-free form.  This applies whatever the source and target
    languages are.

(2) A sorting algorithm.

    sort.6 <- function (a) {
       n <- length(a)
       adapt <- function (i) {i+1}
       a <- c(0,a)
       for (i in 2:n) {
	  j <- i-1
	  a[adapt(0)] <- a[adapt(i)]
	  while (a[adapt(j)] > a[adapt(0)]) {
	     a[adapt(j+1)] <- a[adapt(j)]
	     j <- j-1
	  }
	  a[adapt(j+1)] <- a[adapt(0)]
       }
       a[-1]
    }

    The really interesting thing here is that this basically is an
    index origin 1 algorithm.  The original array and the final result
    start at 1, not 0, and position 0 is used for a "sentinel".

    Let's convert it to 1-origin.  I'll skip the details of how I
    did it because that's not the point I want to make.

    sort.VI <- function (a) {
       a <- c(0, a)
       for (i in 3:length(a)) {
	  j <- i-1
	  a[1] <- a[i]
	  while (a[j] > a[1]) {
	     a[j+1] <- a[j]
	     j <- j-1
	  }
	  a[j+1] <- a[1]
       }
       a[-1]
    }

    What do you get if you move up to index-free form?

    sort.six <- function (a) {
	s <- c()
	for (x in a) {
	    f <- s <= x
	    s <- c(s[f], x, s[!f])    # insert x stably into s
	}
	s
    }

    It's clear that sort.six is shorter, clearer, and easier to get
    right than sort.VI.  But how much do we have to pay for this?
    How much efficiency do we lose?

    > a <- runif(400)
    > system.time(sort.VI(a))
    [1]  3.64  0.02 12.56  0.00  0.00
    > system.time(sort.six(a))
    [1] 0.15 0.01 0.16 0.00 0.00

    We don't lose any efficiency at all.  We gain, considerably.
    (Not as much as we'd gain by using the built-in sort(), of course.)

I expect that this will happen fairly often:  rewriting the code to be
index-free will *usually* make it shorter and clearer, and will *always*
make it easier to adapt to a language with a different index origin.
When the target language is R or S, the result is likely to be faster
than a direct conversion.