Running Median and Mean
David Brahm
brahm@alum.mit.edu
Mon, 16 Sep 2002 11:09:30 -0400
R gurus,
On Aug 20, 2002, I asked in R-help about calculating a running 5-day median on
a large matrix. Thanks to Martin Maechler <maechler@stat.math.ethz.ch> and Ray
Brownrigg <Ray.Brownrigg@mcs.vuw.ac.nz> for responding.
I ended up writing C code (and an R interface) to do it, which is about 1000x
faster than the naive method! (72s became .09s on a 223 x 520 matrix). I
added a running mean function too, which is very simple and is about 15x
faster than apply(..., filter, ...). I've packaged the code as
<http://cran.r-project.org/src/contrib/Devel/runStat_1.1.tar.gz>
The key insight was that it *is* in fact a good idea to calculate the ranks of
the 5 data points, because then it is easy to calculate how those ranks
*change* as you step through the column. For example, suppose the last few
points in a column look like:
data= 20 25 18 14 78 55 29
rank= 3 2 1 5 4
where I have ranked the 5 points prior to the last one. The median is 25 (the
point with rank 3). Stepping to the left:
data= 20 25 18 14 78 55 29
rank= 3 4 2 1 5
For the 4 points that overlap, their ranks remain the same, except they are
incremented by one for points >=20, and decremented by one for points >55. The
rank of the new point (20) is just 5 minus the number of increments done.
The code is copied below for your amusement. Suggestions welcome. Note I have
not dealt with NA's, even values of N, row-wise medians, or non-lagged medians
(i.e. I put the median of points 1:5 into position 6, not position 5). It
would be great if somebody (Martin?) wanted to incorporate these functions into
an existing package or even "base"; otherwise I'll post the final product on
CRAN/src/contrib.
-- David Brahm (brahm@alum.mit.edu)
############### C Code: ##################
#include <R.h>
void runMedian(double *m, int *d, int *N) {
int i, j, k, a, R=(*N+1)/2, *r=(int*) R_alloc(*N, sizeof(int));
double old, new, *x;
for (j=1; j <= d[1]; j++) { /* Loop over columns */
x = m + j*d[0] - 1 - *N; /* x[*N] = m[nrow(m),j] */
for (i=0; i < *N; i++) { /* Calculate initial ranks "r" */
r[i] = 1; /* r higher if larger or later */
for (k=0; k < i; k++) if (x[i] >= x[k]) r[i]++;
for (k=i+1; k < *N; k++) if (x[i] > x[k]) r[i]++;
if (r[i]==R) x[*N] = x[i]; /* If rank=R, this is the median */
}
for (i=d[0]-1; i > *N; i--) { /* Now x[*N] = m[i,j] */
x--; old=x[*N]; new=x[0]; a=*N; /* a = rank of new guy */
for (k=*N-1; k > 0; k--) { /* Recalculate each element's rank */
r[k] = r[k-1]; /* Shift previous iteration's ranks */
if (x[k] >= new) {r[k]++; a--;} /* Are we adding a lower number? */
if (x[k] > old) {r[k]--;} /* Are we dropping a higher number? */
if (r[k]==R) x[*N] = x[k]; /* If rank=R, this is the median */
}
r[0] = a;
if (a==R) x[*N] = new; /* Is the new guy the median? */
}
}
}
void runMean(double *m, int *d, int *N) {
int i, j, k;
double new, *x;
for (j=1; j <= d[1]; j++) { /* Loop over columns */
x = m + j*d[0] - 1 - *N;
for (i=d[0]; i > *N; i--) { /* x[*N] = m[i,j] */
for (k=0, new=0; k < *N; k++) new += x[k];
x[*N] = new / *N; x--;
}
}
}
############### R Code: ##################
runMedian <- function(m, N=5) {
if (!(N %% 2)) stop("N should be odd.")
d <- od <- dim(m)
if (!(nd <- length(d))) d <- c(length(m), 1) # Vector -> matrix
if (nd > 2) d <- c(d[1], prod(d[-1])) # Array -> matrix
if (d[1] <= N) {m[] <- NA; return(m)}
m <- array(.C("runMedian", m=as.real(m), as.integer(d), as.integer(N))$m, d)
m[1:N, ] <- NA
if (nd==2) m else if (!nd) m[ ,1] else array(m, od) # Matrix/Vector/Array
}
runMean <- function(m, N=21) {
d <- od <- dim(m)
if (!(nd <- length(d))) d <- c(length(m), 1) # Vector -> matrix
if (nd > 2) d <- c(d[1], prod(d[-1])) # Array -> matrix
if (d[1] <= N) {m[] <- NA; return(m)}
m <- array(.C("runMean", m=as.real(m), as.integer(d), as.integer(N))$m, d)
m[1:N, ] <- NA
if (nd==2) m else if (!nd) m[ ,1] else array(m, od) # Matrix/Vector/Array
}
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._