[R] Efficient way of creating a shifted (lagged) variable?

Joshua Wiley jwiley.psych at gmail.com
Thu Aug 4 22:02:02 CEST 2011



On Aug 4, 2011, at 11:46, Dimitri Liakhovitski <dimitri.liakhovitski at gmail.com> wrote:

> Thanks a lot, guys!
> It's really helpful. But - to be objective- it's still quite a few
> lines longer than in SPSS.

Not once you've sources the function!  For the simple case of a vector, try:

X <- 1:10
mylag2 <- function(X, lag) {
  c(rep(NA, length(seq(lag))), X[-seq(lag)])
}

Though this does not work for lead, it is fairly short. Then you could use the *apply family if you needed it on multiple columns or vectors.

Cheers,

Josh

> Dimitri
> 
> On Thu, Aug 4, 2011 at 2:36 PM, Daniel Nordlund <djnordlund at frontier.com> wrote:
>> 
>> 
>>> -----Original Message-----
>>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
>>> On Behalf Of Dimitri Liakhovitski
>>> Sent: Thursday, August 04, 2011 8:24 AM
>>> To: r-help
>>> Subject: [R] Efficient way of creating a shifted (lagged) variable?
>>> 
>>> Hello!
>>> 
>>> I have a data set:
>>> set.seed(123)
>>> y<-data.frame(week=seq(as.Date("2010-01-03"), as.Date("2011-01-
>>> 31"),by="week"))
>>> y$var1<-c(1,2,3,round(rnorm(54),1))
>>> y$var2<-c(10,20,30,round(rnorm(54),1))
>>> 
>>> # All I need is to create lagged variables for var1 and var2. I looked
>>> around a bit and found several ways of doing it. They all seem quite
>>> complicated - while in SPSS it's just a few letters (like LAG()). Here
>>> is what I've written but I wonder. It works - but maybe there is a
>>> very simple way of doing it in R that I could not find?
>>> I need the same for "lead" (opposite of lag).
>>> Any hint is greatly appreciated!
>>> 
>>> ### The function I created:
>>> mylag <- function(x,max.lag=1){   # x has to be a 1-column data frame
>>>    temp<-
>>> as.data.frame(embed(c(rep(NA,max.lag),x[[1]]),max.lag+1))[2:(max.lag+1)]
>>>    for(i in 1:length(temp)){
>>>      names(temp)[i]<-paste(names(x),".lag",i,sep="")
>>>     }
>>>   return(temp)
>>> }
>>> 
>>> ### Running mylag to get my result:
>>> myvars<-c("var1","var2")
>>> for(i in myvars) {
>>>   y<-cbind(y,mylag(y[i]),max.lag=2)
>>> }
>>> (y)
>>> 
>>> --
>>> Dimitri Liakhovitski
>>> marketfusionanalytics.com
>>> 
>> 
>> Dimitri,
>> 
>> I would first look into the zoo package as has already been suggested.  However, if you haven't already got your solution then here are a couple of functions that might help you get started.  I won't vouch for efficiency.
>> 
>> 
>> lag.fun <- function(df, x, max.lag=1) {
>>  for(i in x) {
>>    for(j in 1:max.lag){
>>      lagx <- paste(i,'.lag',j,sep='')
>>      df[,lagx] <- c(rep(NA,j),df[1:(nrow(df)-j),i])
>>    }
>>  }
>>  df
>> }
>> 
>> lead.fun <- function(df, x, max.lead=1) {
>>  for(i in x) {
>>    for(j in 1:max.lead){
>>      leadx <- paste(i,'.lead',j,sep='')
>>      df[,leadx] <- c(df[(j+1):(nrow(df)),i],rep(NA,j))
>>    }
>>  }
>>  df
>> }
>> 
>> y <- lag.fun(y,myvars,2)
>> y <- lead.fun(y,myvars,2)
>> 
>> 
>> Hope this is helpful,
>> 
>> Dan
>> 
>> Daniel Nordlund
>> Bothell, WA USA
>> 
>> 
>> 
> 
> 
> 
> -- 
> Dimitri Liakhovitski
> marketfusionanalytics.com
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list