[R] lag a data.frame column?
Achim Zeileis
Achim.Zeileis at wu-wien.ac.at
Wed Sep 9 20:09:40 CEST 2009
On Wed, 9 Sep 2009, Mark Knecht wrote:
> Sometimes it's the simple things...
>
> Why doesn't this lag X$x by 3 and place it in X$x1?
It does.
> (i.e. - Na's in the first 3 rows and then values showing up...)
Because this is not how the "ts" class handles lags.
What happens is that X$x is transformed to "ts"
as.ts(X$x)
which is now a regular series with frequency 1 starting at 1 and ending at
10. If you apply lag(), the data is not modified at all, just the time
index is shifted
lag(as.ts(X$x), 3)
Thus it does not create any NAs or - even worse - throws away observations
(which is not necessary because the frequency time series is known and the
time index can be extended).
BTW: You almost surely wanted lag(..., -3). Personally, I also don't find
this intuitive but it's how things are (as documented on the man page).
> The help page does talk about time series. If lag doesn't work on
> data.frame columns then what would be the right function to use to lag
> by a variable amount?
That depends what you want to do. If your data really is a time series,
then using a time series class (such as "ts", or "zoo" etc.) would
probably be preferable. This would probably also get you further benefits
for data processing.
If for some reason you can't do that, it shouldn't be too difficult to
write a function that does what you want for your personal use
mylag <- function(x, k) c(rep(NA, k), x[1:(length(x)-k)])
which assumes that k is a positive integer and length(x) > k.
Best,
Z
More information about the R-help
mailing list