[R] Drop observations in unbalanced panel data set according to missing values
David Winsemius
dwinsemius at comcast.net
Sat May 29 00:28:13 CEST 2010
On May 28, 2010, at 5:58 PM, Christian Schoder wrote:
> Dear R-users,
>
> I use firm-level data in panel structure. I would like to drop all
> firms that have less than x observations over the time scale in any
> of the variables considered. I would appreciate any help that (a)
> indicates relevant literature or websites or (b) indicates the code
> that could solve the problem.
>
> Here, a detailed illustration of my problem: My data set is of the
> form
>> df
> id y z
> 1 a 1 1
> 2 b NA 2
> 3 b 3 3
> 4 c 2 2
> 5 c 4 4
> 6 c 5 NA
> 7 d 6 NA
> 8 d 5 5
> 9 d 6 6
> 10 d 7 7
> 11 e NA NA
> 12 e NA 4
> 13 e 3 3
> where id is the index of the firm, and y and z are observations such
> as assets and sales. Now I would like to apply a procedure that
> drops all firms which have less then 2 observed realizations in y or
> z.
I try to avoid naming objects with common function names like df:
> dfrm$nrecy <- ave(dfrm$y , dfrm$id, FUN=function(x) sum(!is.na(x)) )
> dfrm$nrecz <- ave(dfrm$z , dfrm$id, FUN=function(x) sum(!is.na(x)) )
> dfrm
id y z nrecy nrecz
1 a 1 1 1 1
2 b NA 2 1 2
3 b 3 3 1 2
4 c 2 2 3 2
5 c 4 4 3 2
6 c 5 NA 3 2
7 d 6 NA 4 3
8 d 5 5 4 3
9 d 6 6 4 3
10 d 7 7 4 3
11 e NA NA 1 2
12 e NA 4 1 2
13 e 3 3 1 2
> dfrm[with(dfrm, pmin(nrecy, nrecz)>1), ]
id y z nrecy nrecz
4 c 2 2 3 2
5 c 4 4 3 2
6 c 5 NA 3 2
7 d 6 NA 4 3
8 d 5 5 4 3
9 d 6 6 4 3
10 d 7 7 4 3
Now it does not thereby assure that you will have at least 2 of each
id with complete observationssince. But if you wanted a solution to
that problem you would need a better testing data.frame.
> Thus, it should give me a data.frame which looks like
>> df1
> id y z
> 1 c 2 2
> 2 c 4 4
> 3 c 5 NA
> 4 d 6 NA
> 5 d 5 5
> 6 d 6 6
> 7 d 7 7
>
> Thank you very much!
> Christian Schoder
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list