[R] Lag based on Date objects with non-consecutive values
Sam Albers
tonightsthenight at gmail.com
Tue Mar 20 01:03:20 CET 2012
Hello R-ers,
I just wanted to update this post. I've made some progress on this but
am still not quite where I need to be. I feel like I am close so I
just wanted to share my work so far.
Thanks in advance!
Sam
On Mon, Mar 19, 2012 at 1:10 PM, Sam Albers <tonightsthenight at gmail.com> wrote:
> Hello all,
>
> I need to figure out a way to lag a variable in by a number of days
> without using the zoo package. I need to use a remote R connection
> that doesn't have the zoo package installed and is unwilling to do so.
> So that is, I want a function where I can specify the number of days
> to lag a variable against a Date formatted column. That is relatively
> easy to do. The problem arises when I don't have consecutive dates. I
> can't seem to figure out a way to insert an NA when there is
> non-consecutive date. So for example:
>
>
> ## A dataframe with non-consecutive dates
> set.seed(32)
> df1<-data.frame(
> Date=seq(as.Date("1967-06-05","%Y-%m-%d"),by="day", length=5),
> Dis1=rnorm(5, 1,10)
> )
> df2<-data.frame(
> Date=seq(as.Date("1967-07-05","%Y-%m-%d"),by="day", length=10),
> Dis1=rnorm(5, 1,10)
> )
>
> df <- rbind(df1,df2); df
>
> ## A function to lag the variable by a specified number of days
> lag.day <- function (lag.by, data) {
> c(rep(NA,lag.by), head(data$Dis1, -lag.by))
> }
>
> ## Using the function
> df$lag1 <- lag.day(lag.by=1, data=df); df
> ## returns this data frame
>
> Date Dis1 lag1
> 1 1967-06-05 1.146405 NA
> 2 1967-06-06 9.732887 1.146405
> 3 1967-06-07 -9.279462 9.732887
> 4 1967-06-08 7.856646 -9.279462
> 5 1967-06-09 5.494370 7.856646
> 6 1967-06-15 5.070176 5.494370
> 7 1967-06-16 3.847314 5.070176
> 8 1967-06-17 -5.243094 3.847314
> 9 1967-06-18 9.396560 -5.243094
> 10 1967-06-19 4.112792 9.396560
>
>
> ## When really what I would like is something like this:
>
> Date Dis1 lag1
> 1 1967-06-05 1.146405 NA
> 2 1967-06-06 9.732887 1.146405
> 3 1967-06-07 -9.279462 9.732887
> 4 1967-06-08 7.856646 -9.279462
> 5 1967-06-09 5.494370 7.856646
> 6 1967-06-15 5.070176 NA
> 7 1967-06-16 3.847314 5.070176
> 8 1967-06-17 -5.243094 3.847314
> 9 1967-06-18 9.396560 -5.243094
> 10 1967-06-19 4.112792 9.396560
I've now gotten this far but have realized that my approach is flawed
because if I increase the lag.by value to anything great than 1, an NA
is no longer entered into the correct position. So here is my updated
effort:
lag.by <- function (data, lag.by) {
tmp<-data.frame(
## Difference in days between dates
diff=c(diff(data$Date), NA),
lag.tmp=c(rep(NA,lag.by), head(data$Dis1, -lag.by))
)
## Diff calculates difference to next row so all the difference
## values need to be lagged
ifelse(c(rep(NA,lag.by), head(tmp$diff, -lag.by))<=1,tmp$lag.tmp,NA)
}
df$lag <- lag.by(df, lag.by=1)
df$lag2 <- lag.by(df, lag.by=2); df
Date Dis1 lag lag2
1 1967-06-05 1.146405 NA NA
2 1967-06-06 9.732887 1.146405 NA
3 1967-06-07 -9.279462 9.732887 1.146405
4 1967-06-08 7.856646 -9.279462 9.732887
5 1967-06-09 5.494370 7.856646 -9.279462
6 1967-06-15 5.070176 NA 7.856646 <- Need this to be a NA
7 1967-06-16 3.847314 5.070176 NA
8 1967-06-17 -5.243094 3.847314 5.070176
9 1967-06-18 9.396560 -5.243094 3.847314
10 1967-06-19 4.112792 9.396560 -5.243094
So, I should have NA's in the lag2 column at rows 6 and 7. Any help or
thoughts would be much appreciated here.
>
> So can anyone recommend a way (either using my function or any other
> approaches) that I might be able to consistently lag values based on a
> lag.by value and consecutive dates?
>
> Thanks so much in advance!
>
> Sam
More information about the R-help
mailing list