[R] Drop observations in unbalanced panel data set according to missing values
Christian Schoder
schoc152 at newschool.edu
Fri May 28 23:58:44 CEST 2010
Dear R-users,
I use firm-level data in panel structure. I would like to drop all firms that have less than x observations over the time scale in any of the variables considered. I would appreciate any help that (a) indicates relevant literature or websites or (b) indicates the code that could solve the problem.
Here, a detailed illustration of my problem: My data set is of the form
> df
id y z
1 a 1 1
2 b NA 2
3 b 3 3
4 c 2 2
5 c 4 4
6 c 5 NA
7 d 6 NA
8 d 5 5
9 d 6 6
10 d 7 7
11 e NA NA
12 e NA 4
13 e 3 3
where id is the index of the firm, and y and z are observations such as assets and sales. Now I would like to apply a procedure that drops all firms which have less then 2 observed realizations in y or z. Thus, it should give me a data.frame which looks like
> df1
id y z
1 c 2 2
2 c 4 4
3 c 5 NA
4 d 6 NA
5 d 5 5
6 d 6 6
7 d 7 7
Thank you very much!
Christian Schoder
More information about the R-help
mailing list