[Rd] creating lagged variables

Antonio, Fabio Di Narzo antonio.fabio at gmail.com
Thu Dec 13 19:21:15 CET 2007


Hi all.
I'm looking for robust ways of building lagged variables in a dataset
with multiple individuals.

Consider a dataset with variables like the following:
##
set.seed(123)
d <- data.frame(id = rep(1:2, each=3), time=rep(1:3, 2), value=rnorm(6))
##
>d
  id time       value
1  1    1 -0.56047565
2  1    2 -0.23017749
3  1    3  1.55870831
4  2    1  0.07050839
5  2    2  0.12928774
6  2    3  1.71506499

I want to compute the lagged variable 'value(t-1)', taking subject id
into account.
My current effort produced the following:
##
my_lag <- function(dt, varname, timevarname='time', lag=1) {
	vname <- paste(varname, if(lag>0) '.' else '', lag, sep='')
	timevar <- dt[[timevarname]]
	dt[[vname]] <- dt[[varname]][match(timevar, timevar + lag)]
	dt
}
lag_by <- function(dt, idvarname='id', ...)
  do.call(rbind, by(dt, dt[[idvarname]], my_lag, ...))
##
With the previous data I get:

> lag_by(d, varname='value')
    id time       value     value.1
1.1  1    1 -0.56047565          NA
1.2  1    2 -0.23017749 -0.56047565
1.3  1    3  1.55870831 -0.23017749
2.4  2    1  0.07050839          NA
2.5  2    2  0.12928774  0.07050839
2.6  2    3  1.71506499  0.12928774

So that seems working. However, I was thinking if there is a
smarter/cleaner/more robust way to do the job. For instance, with the
above function I get dataframe rows re-ordering as a side-effect
(anyway this is of no concern in my current analysis)...
Any suggestion?

All the bests,
Fabio.
-- 
Antonio, Fabio Di Narzo
Ph.D. student at
Department of Statistical Sciences
University of Bologna, Italy



More information about the R-devel mailing list