[R] Efficient way of creating a shifted (lagged) variable?

Dimitri Liakhovitski dimitri.liakhovitski at gmail.com
Fri Aug 5 15:11:22 CEST 2011


Michael is totally correct. I work with a data frame that happens to
have weeks associated with it. So, it looks like I would not really
benefit from ts functionality...
Dimitri

On Thu, Aug 4, 2011 at 11:37 PM, R. Michael Weylandt
<michael.weylandt at gmail.com> wrote:
> Yes, the stats package has a lag function, but it's not really appropriate
> for the sample data Dimitri gave us: to wit, it's not of "ts" (time series)
> class so lag doesn't know what to do with it and gives an error message.
> Perhaps it's just that I never really took the time to get used to it, but
> I'm not a fan of R's "ts" class
>
> And while I do endorse the use of time series objects over data frames when
> appropriate, I'd suggest going for the xts class which has greater
> functionality and it's default lag seems to have a more logical default:
> lagging the series in xts moves the first data point back, while it moves it
> forward in the ts class.
>
> I guess my point is: if Dimitri is working with the dates directly instead
> of time series (as some of his other posts have suggested), he should stay
> in the data frame type and use the lag I, Daniel, or Josh wrote: if he has
> proper time series, he might as well jump right into xts.
>
> Michael Weylandt
>
> On Thu, Aug 4, 2011 at 7:57 PM, Ken H <vicvoncastle at gmail.com> wrote:
>>
>> Hey all,
>>   Correct me if I'm wrong but, the 'stats' package has a lag() function
>> like so
>>    lagged.series=lag(series,number of lags wanted)
>>    Furthermore, I am pretty sure that lag( ) accepts negative lags:=>
>> leads.
>>                   lag(x,1)=> object of one lag, lag(x,-1) object with one
>> lead.
>>               Hope this answers your question,
>>                        Ken
>>
>> On Thu, Aug 4, 2011 at 4:19 PM, Dimitri Liakhovitski <
>> dimitri.liakhovitski at gmail.com> wrote:
>>
>> > Thanks a lot for the recommendations - some of them I am implementing
>> > already.
>> >
>> > Just a clarification:
>> > the only reason I try to compare things to SPSS is that I am the only
>> > person in my office using R. Whenever I work on an R code my goal is
>> > not just to make it work, but also to "boast" to the SPSS users that
>> > it's much easier/faster/niftier in R. So, you are preaching to the
>> > choir here.
>> >
>> > Dimitri
>> >
>> >
>> > On Thu, Aug 4, 2011 at 4:02 PM, Joshua Wiley <jwiley.psych at gmail.com>
>> > wrote:
>> > >
>> > >
>> > > On Aug 4, 2011, at 11:46, Dimitri Liakhovitski <
>> > dimitri.liakhovitski at gmail.com> wrote:
>> > >
>> > >> Thanks a lot, guys!
>> > >> It's really helpful. But - to be objective- it's still quite a few
>> > >> lines longer than in SPSS.
>> > >
>> > > Not once you've sources the function!  For the simple case of a
>> > > vector,
>> > try:
>> > >
>> > > X <- 1:10
>> > > mylag2 <- function(X, lag) {
>> > >  c(rep(NA, length(seq(lag))), X[-seq(lag)])
>> > > }
>> > >
>> > > Though this does not work for lead, it is fairly short. Then you could
>> > use the *apply family if you needed it on multiple columns or vectors.
>> > >
>> > > Cheers,
>> > >
>> > > Josh
>> > >
>> > >> Dimitri
>> > >>
>> > >> On Thu, Aug 4, 2011 at 2:36 PM, Daniel Nordlund <
>> > djnordlund at frontier.com> wrote:
>> > >>>
>> > >>>
>> > >>>> -----Original Message-----
>> > >>>> From: r-help-bounces at r-project.org [mailto:
>> > r-help-bounces at r-project.org]
>> > >>>> On Behalf Of Dimitri Liakhovitski
>> > >>>> Sent: Thursday, August 04, 2011 8:24 AM
>> > >>>> To: r-help
>> > >>>> Subject: [R] Efficient way of creating a shifted (lagged) variable?
>> > >>>>
>> > >>>> Hello!
>> > >>>>
>> > >>>> I have a data set:
>> > >>>> set.seed(123)
>> > >>>> y<-data.frame(week=seq(as.Date("2010-01-03"), as.Date("2011-01-
>> > >>>> 31"),by="week"))
>> > >>>> y$var1<-c(1,2,3,round(rnorm(54),1))
>> > >>>> y$var2<-c(10,20,30,round(rnorm(54),1))
>> > >>>>
>> > >>>> # All I need is to create lagged variables for var1 and var2. I
>> > >>>> looked
>> > >>>> around a bit and found several ways of doing it. They all seem
>> > >>>> quite
>> > >>>> complicated - while in SPSS it's just a few letters (like LAG()).
>> > >>>> Here
>> > >>>> is what I've written but I wonder. It works - but maybe there is a
>> > >>>> very simple way of doing it in R that I could not find?
>> > >>>> I need the same for "lead" (opposite of lag).
>> > >>>> Any hint is greatly appreciated!
>> > >>>>
>> > >>>> ### The function I created:
>> > >>>> mylag <- function(x,max.lag=1){   # x has to be a 1-column data
>> > >>>> frame
>> > >>>>    temp<-
>> > >>>>
>> > as.data.frame(embed(c(rep(NA,max.lag),x[[1]]),max.lag+1))[2:(max.lag+1)]
>> > >>>>    for(i in 1:length(temp)){
>> > >>>>      names(temp)[i]<-paste(names(x),".lag",i,sep="")
>> > >>>>     }
>> > >>>>   return(temp)
>> > >>>> }
>> > >>>>
>> > >>>> ### Running mylag to get my result:
>> > >>>> myvars<-c("var1","var2")
>> > >>>> for(i in myvars) {
>> > >>>>   y<-cbind(y,mylag(y[i]),max.lag=2)
>> > >>>> }
>> > >>>> (y)
>> > >>>>
>> > >>>> --
>> > >>>> Dimitri Liakhovitski
>> > >>>> marketfusionanalytics.com
>> > >>>>
>> > >>>
>> > >>> Dimitri,
>> > >>>
>> > >>> I would first look into the zoo package as has already been
>> > >>> suggested.
>> >  However, if you haven't already got your solution then here are a
>> > couple of
>> > functions that might help you get started.  I won't vouch for
>> > efficiency.
>> > >>>
>> > >>>
>> > >>> lag.fun <- function(df, x, max.lag=1) {
>> > >>>  for(i in x) {
>> > >>>    for(j in 1:max.lag){
>> > >>>      lagx <- paste(i,'.lag',j,sep='')
>> > >>>      df[,lagx] <- c(rep(NA,j),df[1:(nrow(df)-j),i])
>> > >>>    }
>> > >>>  }
>> > >>>  df
>> > >>> }
>> > >>>
>> > >>> lead.fun <- function(df, x, max.lead=1) {
>> > >>>  for(i in x) {
>> > >>>    for(j in 1:max.lead){
>> > >>>      leadx <- paste(i,'.lead',j,sep='')
>> > >>>      df[,leadx] <- c(df[(j+1):(nrow(df)),i],rep(NA,j))
>> > >>>    }
>> > >>>  }
>> > >>>  df
>> > >>> }
>> > >>>
>> > >>> y <- lag.fun(y,myvars,2)
>> > >>> y <- lead.fun(y,myvars,2)
>> > >>>
>> > >>>
>> > >>> Hope this is helpful,
>> > >>>
>> > >>> Dan
>> > >>>
>> > >>> Daniel Nordlund
>> > >>> Bothell, WA USA
>> > >>>
>> > >>>
>> > >>>
>> > >>
>> > >>
>> > >>
>> > >> --
>> > >> Dimitri Liakhovitski
>> > >> marketfusionanalytics.com
>> > >>
>> > >> ______________________________________________
>> > >> R-help at r-project.org mailing list
>> > >> https://stat.ethz.ch/mailman/listinfo/r-help
>> > >> PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > >> and provide commented, minimal, self-contained, reproducible code.
>> > >
>> >
>> >
>> >
>> > --
>> > Dimitri Liakhovitski
>> > marketfusionanalytics.com
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>



-- 
Dimitri Liakhovitski
marketfusionanalytics.com



More information about the R-help mailing list