[R] Drop firms in unbalanced panel if not more than 5 observations in consecutive years for all variables
Dimitris Rizopoulos
d.rizopoulos at erasmusmc.nl
Thu Jul 22 13:34:38 CEST 2010
try this:
Dat <- read.table(textConnection(
"id year y z
1 a 2000 1 1
2 b 2000 NA 2
3 b 2001 3 3
4 c 1999 1 1
5 c 2000 2 2
6 c 2001 4 NA
7 c 2002 5 4
8 d 1998 6 5
9 d 1999 5 NA
10 d 2000 6 6
11 d 2001 7 7
12 d 2002 3 6"
), header = TRUE)
closeAllConnections()
n.years <- 3 # the threshold
na.ind <- !rowSums(is.na(Dat[-(1:2)])) # the variables of interest
ind <- ave(na.ind, Dat$id, FUN = function (x) any(cumsum(x) > n.years))
Dat[ind, ]
I hope it helps.
Best,
Dimitris
On 7/22/2010 11:18 AM, Christian Schoder wrote:
> Dear R-user,
>
> a few weeks ago I consulted the list-serve with a similar question.
> However, my task changed a little but sufficiently to get lost again. So
> I would appreciate any help on the following issue.
>
> I use the plm package and work with firm-level data in a panel. I would
> like to eliminate all firms that do not fulfill the requirement of
> having an observation in every variable used for at least x consecutive
> years.
>
> For illustration of the problem assume the following data set
>> data
> id year y z
> 1 a 2000 1 1
> 2 b 2000 NA 2
> 3 b 2001 3 3
> 4 c 1999 1 1
> 5 c 2000 2 2
> 6 c 2001 4 NA
> 7 c 2002 5 4
> 8 d 1998 6 5
> 9 d 1999 5 NA
> 10 d 2000 6 6
> 11 d 2001 7 7
> 12 d 2002 3 6
> where id is the index of the firm, year the index for the year, and y
> and z are variables. Now, I would like to get rid of all firms with,
> let's say, less than 3 consecutive years in which there are observations
> for every variable. Hence, the procedure should yield
>> data.reduced
> id year y z
> 1 d 1998 6 5
> 2 d 1999 5 NA
> 3 d 2000 6 6
> 4 d 2001 7 7
> 5 d 2002 3 6
>
> Thank you very much for any help!
>
> Cheers, Christian
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center
Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014
More information about the R-help
mailing list