[R] Operating on windows of data

Gabor Grothendieck ggrothendieck at myway.com
Mon Mar 22 14:33:33 CET 2004


> On Mon, Mar 22, 2004 at 01:39:28AM -0500, Gabor Grothendieck wrote:
> > You can retain the trick of using subset and still get
> > rid of the loop in:
> > 
> > http://www.mayin.org/ajayshah/KB/R/EXAMPLES/rollingreg.R
> > 
> > by using sapply like this (untested):
> > 
> > dat <- sapply( seq(T-width), function(i) {
> > model <- lm(dlinrchf ~ dlusdchf + dljpychf + dldemchf, A, 
> > i:(i+width-1))
> > details <- summary.lm(model)
> > tmp <- coefficients(model)
> > c( USD = tmp[2], JPY = tmp[3], DEM = tmp[4], 
> > R2 = details$r.squared, RMSE = details$sigma )
> > } )
> > dat <- as.data.frame(t(dat))
> > attach(dat)
> 
> This brings me to a question I've always had about "the R way" of
> avoiding loops. Yes, the sapply() approach above works. My question
> is: Why is this much better than writing it using loops?
> 
> Loops tap into the intuition of millions of people who have grown up
> around procedural languages. Atleast to a person like me, I can read
> code involving loops effortlessly.
> 
> And I don't see how much faster the sapply() will be. Intuitively, we
> may think that the sapply() results in C code getting executed (in the
> R sources), while the for loop results in interpretation overhead, and
> so the sapply() is surely faster. But when the body of the for loop
> involves a weighty thing like a QR decomposition (for the OLS), that
> would seem to dominate the cost - as far as I can tell.

Its true that vectorizing loops can make it faster but in my
mind the main advantage is the conceptual one of working with
whole objects at a time and consequently reduction of code size.

Admittedly, the example above does not get you much although
even here it is slightly shorter than the loop version as it 
puts the arrays together for you rather than making you set them 
up yourself.  Also, not shown, there are subsequent 
summary() statements in your file and there would be
a further opportunity for code reduction 
since now your data is in a data frame rather than 
individual vectors so you could combine all the summary statements
into one.

Its really not the best example of the desired approach since 
it chickens out and uses the loop indices to sapply() over but 
I don't think one can expect a complete win for the 
vectorized approach in every single case.




More information about the R-help mailing list