[R] Drop observations in unbalanced panel data set according to missing values

Christian Schoder schoc152 at newschool.edu
Fri May 28 23:58:44 CEST 2010


Dear R-users,

I use firm-level data in panel structure. I would like to drop all firms that have less than x observations over the time scale in any of the variables considered. I would appreciate any help that (a) indicates relevant literature or websites or (b) indicates the code that could solve the problem.

Here, a detailed illustration of my problem: My data set is of the form
> df
   id  y  z
1   a  1  1
2   b NA  2
3   b  3  3
4   c  2  2
5   c  4  4
6   c  5 NA
7   d  6 NA
8   d  5  5
9   d  6  6
10  d  7  7
11  e NA NA
12  e NA  4
13  e  3  3
where id is the index of the firm, and y and z are observations such as assets and sales. Now I would like to apply a procedure that drops all firms which have less then 2 observed realizations in y or z. Thus, it should give me a data.frame which looks like
> df1
   id  y  z
1   c  2  2
2   c  4  4
3   c  5 NA
4   d  6 NA
5   d  5  5
6   d  6  6
7   d  7  7

Thank you very much!
Christian Schoder



More information about the R-help mailing list