[R] Drop observations in unbalanced panel data set according to missing values

David Winsemius dwinsemius at comcast.net
Sat May 29 00:28:13 CEST 2010


On May 28, 2010, at 5:58 PM, Christian Schoder wrote:

> Dear R-users,
>
> I use firm-level data in panel structure. I would like to drop all  
> firms that have less than x observations over the time scale in any  
> of the variables considered. I would appreciate any help that (a)  
> indicates relevant literature or websites or (b) indicates the code  
> that could solve the problem.
>
> Here, a detailed illustration of my problem: My data set is of the  
> form
>> df
>   id  y  z
> 1   a  1  1
> 2   b NA  2
> 3   b  3  3
> 4   c  2  2
> 5   c  4  4
> 6   c  5 NA
> 7   d  6 NA
> 8   d  5  5
> 9   d  6  6
> 10  d  7  7
> 11  e NA NA
> 12  e NA  4
> 13  e  3  3
> where id is the index of the firm, and y and z are observations such  
> as assets and sales. Now I would like to apply a procedure that  
> drops all firms which have less then 2 observed realizations in y or  
> z.


I try to avoid naming objects with  common function names like df:

 > dfrm$nrecy <- ave(dfrm$y , dfrm$id, FUN=function(x) sum(!is.na(x)) )
 > dfrm$nrecz <- ave(dfrm$z , dfrm$id, FUN=function(x) sum(!is.na(x)) )
 > dfrm
    id  y  z nrecy nrecz
1   a  1  1     1     1
2   b NA  2     1     2
3   b  3  3     1     2
4   c  2  2     3     2
5   c  4  4     3     2
6   c  5 NA     3     2
7   d  6 NA     4     3
8   d  5  5     4     3
9   d  6  6     4     3
10  d  7  7     4     3
11  e NA NA     1     2
12  e NA  4     1     2
13  e  3  3     1     2
 > dfrm[with(dfrm, pmin(nrecy, nrecz)>1), ]
    id y  z nrecy nrecz
4   c 2  2     3     2
5   c 4  4     3     2
6   c 5 NA     3     2
7   d 6 NA     4     3
8   d 5  5     4     3
9   d 6  6     4     3
10  d 7  7     4     3

Now it does not thereby assure that you will have at least 2 of each  
id with complete observationssince. But if you wanted a solution to  
that problem you would need a better testing data.frame.


> Thus, it should give me a data.frame which looks like
>> df1
>   id  y  z
> 1   c  2  2
> 2   c  4  4
> 3   c  5 NA
> 4   d  6 NA
> 5   d  5  5
> 6   d  6  6
> 7   d  7  7
>
> Thank you very much!
> Christian Schoder
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list