[R] Moving window regressions - how can I improve this code?

Sun May 16 08:15:12 CEST 2004

Prof. Bates, Gabor,

I had written a posting, some weeks ago, where I had written
fortrannish code for "moving window regressions" (also called 'rolling
window regression'), and asked how to do the code better. Both of you
had put much pain on this, and emerged with this great code:

   data(women)
   movingWindow2 <- function(formula, data, width, ...) {
       mCall = match.call()
       mCall$width = NULL
       mCall[[1]] = as.name("lm")
       mCall$x = mCall$y = TRUE
       bigfit = eval(mCall, parent.frame())
       ncoef = length(coef(bigfit))
       nr = nrow(data)
       width = as.integer(width)[1]
       stopifnot(width >= ncoef, width <= nr)
       y = bigfit$y
       x = bigfit$x
       terms = bigfit$terms
       inds = embed(seq(nr), width)[, rev(seq(width))]
       sumrys <- lapply(seq(nrow(inds)),
                        function(st) {
                            ind = inds[st,]
                            fit = lm.fit(x[ind,], y[ind])
                            fit$terms = terms
                            class(fit) = "lm"
                            summary(fit)
                        })
       list(coefficients = sapply(sumrys, function(sm) coef(sm)[,"Estimate"]),
            Std.Error = sapply(sumrys, function(sm) coef(sm)[,"Std. Error"]),
            sigma = sapply(sumrys, "[[", "sigma"),
            r.squared = sapply(sumrys, "[[", "r.squared"))
   }

I have one piece of information, and one bugreport:

  * I timed this, and compared it against my fortrannish code, and
    this is roughly 2.5x faster :-)

    > junk = data.frame(x=runif(1000), y=runif(1000))
    > system.time(movingWindowRegression(women, 1000, 9, weight ~ height, 2))
    [1] 20.07  0.01 20.80  0.00  0.00
    > system.time(movingWindow2(y ~ x, junk, 10))
    [1] 8.27 0.03 8.43 0.00 0.00

    (My notebook is a Celeron @ 500 Mhz).

  * I find it breaks when I ask him to do a regression on 1:

    > r = movingWindowRegression(junk, 1000, 10, y ~ 1, 1)
    > movingWindow2(y ~ 1, junk, 10) 
    Error in lm.fit(x[ind, ], y[ind]) : `x' must be a matrix

I am in awe of the movingWindow2 code but don't quite know what to
tamper with! :-) Could you please help?

movingWindow2 should be renamed to movingWindow, and it should really
be in some standard CRAN package. I know a lot of people who do moving
window estimation all the time and this fits. It does moving window
regressions, and using the formula X ~ 1, it also gives you rolling
window means and standard deviations.

For anyone who reads this post, and is interested in moving window
regressions, please do go back into the list archives and follow this
thread. The comments and suggestions by everyone have been incredibly
insightful. For the sake of completeness, my old code is at EOF here -
you will find it useful only in case you want to run
movingWindowRegression in my bugdemo above.

As an aside, I noticed there are roll() and rollOLS() functions in the
(commercial) Finmetrics library that's sold with S+. Their prototypes
are :

  rollOLS(formula, data, subset, na.rm=F, contrasts=NULL, start=NULL,
          end=NULL, width=NULL, incr=1, tau=1e-10, trace=T, ...)

  roll(FUNCTION, data, width, incr=1, start=NULL, end=NULL,
       na.rm=F, save.list=NULL, arg.data="data", trace=T, ...)

here FUNCTION is the name of a S function for which rolling estimation
will be performed. The function must take an option "data". 

Their examples are:

  stack.dat = data.frame(Loss=stack.loss, stack.x)
  roll("OLS", stack.dat, 7, formula=Loss~Air.Flow+Water.Temp, save.list="coef")
  rollOLS(Loss~Water.Temp, data=stack.dat, width=6, incr=2)

In the case of rollOLS, they say that an efficient addition and
deletion algorithm is used for fast exection.

-- 
Ajay Shah                                                   Consultant
ajayshah at mayin.org                      Department of Economic Affairs
http://www.mayin.org/ajayshah           Ministry of Finance, New Delhi

# Using moving windows, we do an OLS regression.
#
# The function returns a matrix with T rows. So there are plenty of
# "NA" values in it. The columns are organised as:
#    beta1 s1 beta2 s2 beta3 s3 sigma R^2
# i.e. we have each regression coefficient followed by it's sigma,
# and then at the end we have the residual sigma and the regression R^2.
movingWindowRegression <- function(data, T, width, model, K) {
  results = matrix(nrow=T, ncol=2*K+2)
  for (i in width:T) {
    details <- summary.lm(lm(model, data[(i-width+1):i,]))
    n=1;
    for (j in 1:K) {
      results[i, n:(n+1)]   = details$coefficients[j, 1:2]
      n = n + 2
    }
    results[i, n] = details$sigma
    results[i, n+1] = details$r.squared
  }
  return(results)
}