[R] Moving window regressions - how can I improve this code?

Sun May 16 12:51:39 CEST 2004

Ajay Shah <ajayshah at mayin.org> writes:

> I had written a posting, some weeks ago, where I had written
> fortrannish code for "moving window regressions" (also called 'rolling
> window regression'), and asked how to do the code better. Both of you
> had put much pain on this, and emerged with this great code:
> 
>    data(women)
>    movingWindow2 <- function(formula, data, width, ...) {
>        mCall = match.call()
>        mCall$width = NULL
>        mCall[[1]] = as.name("lm")
>        mCall$x = mCall$y = TRUE
>        bigfit = eval(mCall, parent.frame())
>        ncoef = length(coef(bigfit))
>        nr = nrow(data)
>        width = as.integer(width)[1]
>        stopifnot(width >= ncoef, width <= nr)
>        y = bigfit$y
>        x = bigfit$x
>        terms = bigfit$terms
>        inds = embed(seq(nr), width)[, rev(seq(width))]
>        sumrys <- lapply(seq(nrow(inds)),
>                         function(st) {
>                             ind = inds[st,]
>                             fit = lm.fit(x[ind,], y[ind])
>                             fit$terms = terms
>                             class(fit) = "lm"
>                             summary(fit)
>                         })
>        list(coefficients = sapply(sumrys, function(sm) coef(sm)[,"Estimate"]),
>             Std.Error = sapply(sumrys, function(sm) coef(sm)[,"Std. Error"]),
>             sigma = sapply(sumrys, "[[", "sigma"),
>             r.squared = sapply(sumrys, "[[", "r.squared"))
>    }
> 
> I have one piece of information, and one bugreport:
> 
>   * I timed this, and compared it against my fortrannish code, and
>     this is roughly 2.5x faster :-)
> 
>     > junk = data.frame(x=runif(1000), y=runif(1000))
>     > system.time(movingWindowRegression(women, 1000, 9, weight ~ height, 2))
>     [1] 20.07  0.01 20.80  0.00  0.00
>     > system.time(movingWindow2(y ~ x, junk, 10))
>     [1] 8.27 0.03 8.43 0.00 0.00

You seem to be comparing timings on different data sets and models.
Did you mean to use junk and y ~ x in your first timing call?

>     (My notebook is a Celeron @ 500 Mhz).
> 
>   * I find it breaks when I ask him to do a regression on 1:
> 
>     > r = movingWindowRegression(junk, 1000, 10, y ~ 1, 1)
>     > movingWindow2(y ~ 1, junk, 10) 
>     Error in lm.fit(x[ind, ], y[ind]) : `x' must be a matrix

It looks like I made the common mistake of forgetting to add
drop=FALSE when extracting a subset of a matrix in a context where the
result must be a matrix.  (With the default of drop = TRUE, dimensions
for which the range of the index is of length 1 are dropped.)

Try changing the call of lm.fit to 
   lm.fit(x[ind, , drop = FALSE), y[ind])