[R] Efficient way of creating a shifted (lagged) variable?
Joshua Wiley
jwiley.psych at gmail.com
Thu Aug 4 22:02:02 CEST 2011
On Aug 4, 2011, at 11:46, Dimitri Liakhovitski <dimitri.liakhovitski at gmail.com> wrote:
> Thanks a lot, guys!
> It's really helpful. But - to be objective- it's still quite a few
> lines longer than in SPSS.
Not once you've sources the function! For the simple case of a vector, try:
X <- 1:10
mylag2 <- function(X, lag) {
c(rep(NA, length(seq(lag))), X[-seq(lag)])
}
Though this does not work for lead, it is fairly short. Then you could use the *apply family if you needed it on multiple columns or vectors.
Cheers,
Josh
> Dimitri
>
> On Thu, Aug 4, 2011 at 2:36 PM, Daniel Nordlund <djnordlund at frontier.com> wrote:
>>
>>
>>> -----Original Message-----
>>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
>>> On Behalf Of Dimitri Liakhovitski
>>> Sent: Thursday, August 04, 2011 8:24 AM
>>> To: r-help
>>> Subject: [R] Efficient way of creating a shifted (lagged) variable?
>>>
>>> Hello!
>>>
>>> I have a data set:
>>> set.seed(123)
>>> y<-data.frame(week=seq(as.Date("2010-01-03"), as.Date("2011-01-
>>> 31"),by="week"))
>>> y$var1<-c(1,2,3,round(rnorm(54),1))
>>> y$var2<-c(10,20,30,round(rnorm(54),1))
>>>
>>> # All I need is to create lagged variables for var1 and var2. I looked
>>> around a bit and found several ways of doing it. They all seem quite
>>> complicated - while in SPSS it's just a few letters (like LAG()). Here
>>> is what I've written but I wonder. It works - but maybe there is a
>>> very simple way of doing it in R that I could not find?
>>> I need the same for "lead" (opposite of lag).
>>> Any hint is greatly appreciated!
>>>
>>> ### The function I created:
>>> mylag <- function(x,max.lag=1){ # x has to be a 1-column data frame
>>> temp<-
>>> as.data.frame(embed(c(rep(NA,max.lag),x[[1]]),max.lag+1))[2:(max.lag+1)]
>>> for(i in 1:length(temp)){
>>> names(temp)[i]<-paste(names(x),".lag",i,sep="")
>>> }
>>> return(temp)
>>> }
>>>
>>> ### Running mylag to get my result:
>>> myvars<-c("var1","var2")
>>> for(i in myvars) {
>>> y<-cbind(y,mylag(y[i]),max.lag=2)
>>> }
>>> (y)
>>>
>>> --
>>> Dimitri Liakhovitski
>>> marketfusionanalytics.com
>>>
>>
>> Dimitri,
>>
>> I would first look into the zoo package as has already been suggested. However, if you haven't already got your solution then here are a couple of functions that might help you get started. I won't vouch for efficiency.
>>
>>
>> lag.fun <- function(df, x, max.lag=1) {
>> for(i in x) {
>> for(j in 1:max.lag){
>> lagx <- paste(i,'.lag',j,sep='')
>> df[,lagx] <- c(rep(NA,j),df[1:(nrow(df)-j),i])
>> }
>> }
>> df
>> }
>>
>> lead.fun <- function(df, x, max.lead=1) {
>> for(i in x) {
>> for(j in 1:max.lead){
>> leadx <- paste(i,'.lead',j,sep='')
>> df[,leadx] <- c(df[(j+1):(nrow(df)),i],rep(NA,j))
>> }
>> }
>> df
>> }
>>
>> y <- lag.fun(y,myvars,2)
>> y <- lead.fun(y,myvars,2)
>>
>>
>> Hope this is helpful,
>>
>> Dan
>>
>> Daniel Nordlund
>> Bothell, WA USA
>>
>>
>>
>
>
>
> --
> Dimitri Liakhovitski
> marketfusionanalytics.com
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list