[R] Still help needed on embeded regression

Tue May 2 10:23:42 CEST 2006

Thank you very much.  It rocks. And actually I
discovered that what really slow down the program is

"return$vol.cap[[i]]=mean(VOL[(i-12):(i-1)],na.rm=TRUE)/return$cap[[i]]"

If I took out this and my original code takes about 10
minutes and halt at the place where all NA shows up.
It seems R is extra slow for something related to
"[[]]".  I end up rewriten this part as a vector
addition, which takes a few seconds, not as great as
what you showed, but way more impressive.  

trail.mean<-function(a,n){
	l<-length(a);temp<-a[1:(l-n)];
	for (i in 2:n)temp<-temp+a[i:(l-n+i-1)];
	c(rep(NA,n),temp/n)
}

return$vol.cap=trail.mean(VOL,12)/return$cap;

For your first method, actually add in "alg="exact""
and runmean does work with NA. Thank you very much.  I
never thought it could be so fast.  It is so tricky
though. :)  

I am wondering if there is any materials about the
efficiency of R.  What command is quick and what is
slow.  I am going to read the runmean's original code
when I have time.  

Thank you again.  I am actually thinking to change to
use SAS before you guys save me.  

--- Gabor Grothendieck <ggrothendieck at gmail.com>
wrote:

> Using runmean from caTools the first one below does
> it in under 1 second but will not handle NAs.  The
> second one takes under 15 seconds and handles
> them by replacing them with linear approximations.
> Note that k must be odd.
> 
> # 1
> 
> library(caTools)
> set.seed(1)
> system.time({
> 	y <- rnorm(140001)
> 	x <- as.numeric(seq(y))
> 	k <- 61
> 	Mxy <- runmean(x * y, k)
> 	Mxx <- runmean(x * x, k)
> 	Mx <- runmean(x, k)
> 	My <- runmean(y, k)
> 	b <- (Mxy - Mx * My) / (Mxx - Mx * Mx)
> 	a <- My - b * Mx
> })
> 
> # 2
> 
> library(caTools)
> library(zoo)
> set.seed(1)
> system.time({
> 	y <- rnorm(140000)
> 	x <- as.numeric(seq(y))
> 	x[100:200] <- NA
> 	x <- na.approx(zoo(x))
> 	y <- zoo(y)
> 	k <- 60
> 	Mxy <- runmean(x * y, k)
> 	Mxx <- runmean(x * x, k)
> 	Mx <- runmean(x, k)
> 	My <- runmean(y, k)
> 	b <- (Mxy - Mx * My) / (Mxx - Mx * Mx)
> 	a <- My - b * Mx
> })
> 
> 
> On 5/1/06, Guojun Zhu <shmilylemon at yahoo.com> wrote:
> > I basically has a long data.frame a.  but I only
> need
> > three columns x,y. Let us say the index of row is
> t.
> > I need to produce new column s_t as the linear
> > regression coefficient of (x_(t-60),...x_(t-1)) on
> > (y_(t-60),...,y_(t-1)). The data is about 140,000
> > rows.  I wrote a simple code on this which is
> super
> > slow, it takes more than 2 hours on a 2.8Ghz Intel
> Duo
> > Core.  My friend use SAS and his code needs only
> > couple of minutes.  I know there must be some more
> > efficient way to write it.  Can anyone help me on
> > this?  Here is the code.
> >
> > Also one line produce a complete NA temp$y and lm
> > function failed on that.  How to make it just
> produce
> > a NA instead and keep runing?
> >
> > attach(return)
> > betat=rep(NA,length(RET))
> > for (i in 61:length(RET)){cat(i," ");
> > if (year[[i]]>=1995){
> >
> >
>
temp<-data.frame(y=RET[(i-60):(i-1)]-riskfree[(i-60):(i-1)],x=sprtrn[(i-60):(i-1)]-riskfree[(i-60):(i-1)])
> >
> >
>
betat[[i]]<-lm(y~x+1,na.action=na.exclude,temp)[[1]][[2]]
> >  #if (i%%100==0)
> > cat(i," ");
> >
> >
> >
>
return$vol.cap[[i]]=mean(VOL[(i-12):(i-1)],na.rm=TRUE)/return$cap[[i]]
> > }
> > }
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
> >
>