[R] Still help needed on embeded regression

Gabor Grothendieck ggrothendieck at gmail.com
Wed May 3 04:19:14 CEST 2006


I was assuming that this would be added to my example
where the data is a zoo object so lag.zoo is being used.
Try this:

> library(zoo)
> z <- zoo(11:15)
> z
 1  2  3  4  5
11 12 13 14 15
> lag(z,-1,na.pad=TRUE)
 1  2  3  4  5
NA 11 12 13 14
> ?lag.zoo


On 5/2/06, Guojun Zhu <shmilylemon at yahoo.com> wrote:
> It does not work though.  How is the lag work? How
> does  the lag work?  I read the help and do not quite
> understand. Here is a test
>
> > y
>  [1]  1  2  3  4  5  6  7  8  9 10
> > coredata(lag(y,-1))
>  [1]  1  2  3  4  5  6  7  8  9 10
> attr(,"tsp")
> [1]  2 11  1
>
> --- Gabor Grothendieck <ggrothendieck at gmail.com>
> wrote:
>
> > Try
> >
> >       runmean2 <- function(x, k) # k must be even
> >               (coredata(runmean(x, k-1)) * (k-1) +
> >                       coredata(lag(x, -k/2, na.pad = TRUE)))/k
> >
> > Also, in your code use matrices or vectors instead
> > of data frames
> > to avoid any overhead in using data frames.
> >
> > On 5/2/06, Guojun Zhu <shmilylemon at yahoo.com> wrote:
> > > Sorry to bother you guys again. This is great.
> > But
> > > this is for 61 number and the second case will
> > change
> > > 60 to 61.  "run*" only accept odd number window.
> > How
> > > to get around it with 60?  Any suggestion? Thanks.
> > >
> > > --- Gabor Grothendieck <ggrothendieck at gmail.com>
> > > wrote:
> > >
> > > > Using runmean from caTools the first one below
> > does
> > > > it in under 1 second but will not handle NAs.
> > The
> > > > second one takes under 15 seconds and handles
> > > > them by replacing them with linear
> > approximations.
> > > > Note that k must be odd.
> > > >
> > > > # 1
> > > >
> > > > library(caTools)
> > > > set.seed(1)
> > > > system.time({
> > > >       y <- rnorm(140001)
> > > >       x <- as.numeric(seq(y))
> > > >       k <- 61
> > > >       Mxy <- runmean(x * y, k)
> > > >       Mxx <- runmean(x * x, k)
> > > >       Mx <- runmean(x, k)
> > > >       My <- runmean(y, k)
> > > >       b <- (Mxy - Mx * My) / (Mxx - Mx * Mx)
> > > >       a <- My - b * Mx
> > > > })
> > > >
> > > > # 2
> > > >
> > > > library(caTools)
> > > > library(zoo)
> > > > set.seed(1)
> > > > system.time({
> > > >       y <- rnorm(140000)
> > > >       x <- as.numeric(seq(y))
> > > >       x[100:200] <- NA
> > > >       x <- na.approx(zoo(x))
> > > >       y <- zoo(y)
> > > >       k <- 60
> > > >       Mxy <- runmean(x * y, k)
> > > >       Mxx <- runmean(x * x, k)
> > > >       Mx <- runmean(x, k)
> > > >       My <- runmean(y, k)
> > > >       b <- (Mxy - Mx * My) / (Mxx - Mx * Mx)
> > > >       a <- My - b * Mx
> > > > })
> > > >
> > > >
> > > > On 5/1/06, Guojun Zhu <shmilylemon at yahoo.com>
> > wrote:
> > > > > I basically has a long data.frame a.  but I
> > only
> > > > need
> > > > > three columns x,y. Let us say the index of row
> > is
> > > > t.
> > > > > I need to produce new column s_t as the linear
> > > > > regression coefficient of
> > (x_(t-60),...x_(t-1)) on
> > > > > (y_(t-60),...,y_(t-1)). The data is about
> > 140,000
> > > > > rows.  I wrote a simple code on this which is
> > > > super
> > > > > slow, it takes more than 2 hours on a 2.8Ghz
> > Intel
> > > > Duo
> > > > > Core.  My friend use SAS and his code needs
> > only
> > > > > couple of minutes.  I know there must be some
> > more
> > > > > efficient way to write it.  Can anyone help me
> > on
> > > > > this?  Here is the code.
> > > > >
> > > > > Also one line produce a complete NA temp$y and
> > lm
> > > > > function failed on that.  How to make it just
> > > > produce
> > > > > a NA instead and keep runing?
> > > > >
> > > > > attach(return)
> > > > > betat=rep(NA,length(RET))
> > > > > for (i in 61:length(RET)){cat(i," ");
> > > > > if (year[[i]]>=1995){
> > > > >
> > > > >
> > > >
> > >
> >
> temp<-data.frame(y=RET[(i-60):(i-1)]-riskfree[(i-60):(i-1)],x=sprtrn[(i-60):(i-1)]-riskfree[(i-60):(i-1)])
> > > > >
> > > > >
> > > >
> > >
> >
> betat[[i]]<-lm(y~x+1,na.action=na.exclude,temp)[[1]][[2]]
> > > > >  #if (i%%100==0)
> > > > > cat(i," ");
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> return$vol.cap[[i]]=mean(VOL[(i-12):(i-1)],na.rm=TRUE)/return$cap[[i]]
> > > > > }
> > > > > }
> > > > >
> > > > > ______________________________________________
> > > > > R-help at stat.math.ethz.ch mailing list
> > > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > > PLEASE do read the posting guide!
> > > > http://www.R-project.org/posting-guide.html
> > > > >
> > > >
> > >
> > >
> > > __________________________________________________
> > > Do You Yahoo!?
> > > Tired of spam?  Yahoo! Mail has the best spam
> > protection around
> > > http://mail.yahoo.com
> > >
> >
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>




More information about the R-help mailing list