[R] reshape panel data

Richard Saba sabaric at auburn.edu
Thu Apr 8 18:55:19 CEST 2010


I have a data set with observations on 549 cities spanning  an 18 year
period. However, some of cities did not report in one or more of the 18
years. I would like to implement the procedure suggested by Wooldridge
section 17.1.3 in his "Econometric analysis of cross section and panel data"
to correct for attrition. For example the table below indicates that the 3rd
and the 7th cities in the data set do not have observations for several
years. The Wooldridge procedure requires the generation of a selection
variable that takes on the value of 1 if the city reports in that year and 0
otherwise. How do I assign a zero to a city when it does not have an
observation for that year?

For example. Suppose I have the following data set. The observation range
over three years 1990-1992. But some cities did not report in some years. 


The original data looks like this:

Cicoid    year       other_variables         seclection-variable

1             1990      x x x x x x x                      1
1             1991      xxxxxxxxxx                         1
2             1991      xxxxxxxxxx                         1
3             1990      xxxxxxxxxx                         1
3             1991      xxxxxxxxxx                         1
3             1992      xxxxxxxxxx                         1

I would like to get a data set that looks like this:

Cicoid    year       other_variables seclection-variable

1             1990      x x x x x x x            1
1             1991      xxxxxxxxxx               1
1             1992      .......                  0
2             1990      ........                 0
2             1991      xxxxxxxxxx               1
2             1992      ........                 0
3             1990      xxxxxxxxxx               1
3             1991      xxxxxxxxxx               1
3             1992      xxxxxxxxxx              1


I can reshape the data using STATA with the following three simple commands:
     xtset Cicoid year
     tsfill ,full
     replace selection_variable=0 if selection_variable==.

I proclaim the data as a panel series identifying the ID and TIME index
variables. Then use the time-series fill command.

I have searched the help and vignettes of both the "zoo" and "plm" packages
but cannot find the solution.
Can anyone help? Thanks,

Richard Saba



More information about the R-help mailing list