[R] Moving window regressions - how can I improve this code?
Douglas Bates
bates at stat.wisc.edu
Sun May 16 12:51:39 CEST 2004
Ajay Shah <ajayshah at mayin.org> writes:
> I had written a posting, some weeks ago, where I had written
> fortrannish code for "moving window regressions" (also called 'rolling
> window regression'), and asked how to do the code better. Both of you
> had put much pain on this, and emerged with this great code:
>
> data(women)
> movingWindow2 <- function(formula, data, width, ...) {
> mCall = match.call()
> mCall$width = NULL
> mCall[[1]] = as.name("lm")
> mCall$x = mCall$y = TRUE
> bigfit = eval(mCall, parent.frame())
> ncoef = length(coef(bigfit))
> nr = nrow(data)
> width = as.integer(width)[1]
> stopifnot(width >= ncoef, width <= nr)
> y = bigfit$y
> x = bigfit$x
> terms = bigfit$terms
> inds = embed(seq(nr), width)[, rev(seq(width))]
> sumrys <- lapply(seq(nrow(inds)),
> function(st) {
> ind = inds[st,]
> fit = lm.fit(x[ind,], y[ind])
> fit$terms = terms
> class(fit) = "lm"
> summary(fit)
> })
> list(coefficients = sapply(sumrys, function(sm) coef(sm)[,"Estimate"]),
> Std.Error = sapply(sumrys, function(sm) coef(sm)[,"Std. Error"]),
> sigma = sapply(sumrys, "[[", "sigma"),
> r.squared = sapply(sumrys, "[[", "r.squared"))
> }
>
> I have one piece of information, and one bugreport:
>
> * I timed this, and compared it against my fortrannish code, and
> this is roughly 2.5x faster :-)
>
> > junk = data.frame(x=runif(1000), y=runif(1000))
> > system.time(movingWindowRegression(women, 1000, 9, weight ~ height, 2))
> [1] 20.07 0.01 20.80 0.00 0.00
> > system.time(movingWindow2(y ~ x, junk, 10))
> [1] 8.27 0.03 8.43 0.00 0.00
You seem to be comparing timings on different data sets and models.
Did you mean to use junk and y ~ x in your first timing call?
> (My notebook is a Celeron @ 500 Mhz).
>
> * I find it breaks when I ask him to do a regression on 1:
>
> > r = movingWindowRegression(junk, 1000, 10, y ~ 1, 1)
> > movingWindow2(y ~ 1, junk, 10)
> Error in lm.fit(x[ind, ], y[ind]) : `x' must be a matrix
It looks like I made the common mistake of forgetting to add
drop=FALSE when extracting a subset of a matrix in a context where the
result must be a matrix. (With the default of drop = TRUE, dimensions
for which the range of the index is of length 1 are dropped.)
Try changing the call of lm.fit to
lm.fit(x[ind, , drop = FALSE), y[ind])
More information about the R-help
mailing list