[R] How to make a lagged variable in panel data?

Gabor Grothendieck ggrothendieck at gmail.com
Sun Aug 14 03:26:50 CEST 2005


On 8/13/05, Ila Patnaik <ila at mayin.org> wrote:
> Suppose we observe N individuals, for each of which we have a
> time-series. How do we correctly create a lagged value of the
> time-series variable?
> 
> As an example, suppose I create:
> 
>  A <- data.frame(year=rep(c(1980:1984),3),
>                  person= factor(sort(rep(1:3,5))),
>                  wage=c(rnorm(15)))
> 
>  > A
>     year person        wage
>  1  1980      1  0.17923212
>  2  1981      1  0.25610292
>  3  1982      1  0.50833655
>  4  1983      1 -0.42448395
>  5  1984      1  0.49233532
>  6  1980      2 -0.49928025
>  7  1981      2  0.06842660
>  8  1982      2  0.65677575
>  9  1983      2  0.15947390
>  10 1984      2 -0.46585116
>  11 1980      3 -0.29052635
>  12 1981      3 -0.27109203
>  13 1982      3 -0.76168164
>  14 1983      3  0.02294361
>  15 1984      3  2.22828032
> 
> What I'd like to do is to make a lagged wage for each person, i.e., I
> should get an additional variable A$wage.lag1:
> 
>  > A
>     year person        wage       wage.lag1
>  1  1980      1  0.17923212             NA
>  2  1981      1  0.25610292     0.17923212
>  3  1982      1  0.50833655     0.25610292
>  4  1983      1 -0.42448395     0.50833655
>  5  1984      1  0.49233532    -0.42448395
>  6  1980      2 -0.49928025             NA
>  7  1981      2  0.06842660    -0.49928025
>  8  1982      2  0.65677575     0.06842660
>  9  1983      2  0.15947390     0.65677575
>  10 1984      2 -0.46585116     0.15947390
>  11 1980      3 -0.29052635             NA
>  12 1981      3 -0.27109203    -0.29052635
>  13 1982      3 -0.76168164    -0.27109203
>  14 1983      3  0.02294361    -0.76168164
>  15 1984      3  2.22828032     0.02294361
> 


We can use 'by' to split data frame A by person and to
apply the function f to each such subset of rows. Function f
makes that portion of wage which corresponds to a single
person into a ts class time series so that we can use lag
with it and then we cbind wage together with its lag.  From
the cbind'ed result we extract out those times that
correspond to the original series since the example output
only includes those. Note that such extraction has a side
effect of turning wages into a matrix rather than a time
series.  We then put every together using cbind(...) once
again and once the 'by' is complete we rbind all rows together.

	f <- function(x) { 
		wage <- ts(x$wage, start = x$year[1])
		idx <- seq(length = length(wage))
		wages <- cbind(wage, lag(wage, -1))[idx,]
		cbind(x[,1:2], wages)
	}

	result <- do.call("rbind", by(A, A$person, f))
	result




More information about the R-help mailing list