[R] Finding Missing Data Patterns

james.holtman@convergys.com james.holtman at convergys.com
Sun Feb 2 15:44:06 CET 2003


use 'rle' to test for the sequence of data--NAs.

Depending on what you want to test for, a length > 2 of 'data/NA' would say
that you have an mix.  If you want 'data' first, then check the first
value.  Here is an example:

> x.1
     [,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]    1    1   NA   NA    1   NA   NA
[2,]    1    1    1    1    1   NA   NA
[3,]   NA   NA   NA    1    1    1    1
[4,]    1   NA    1   NA    1   NA    1
> x.2 <- apply(x.1,1,function(x)rle(is.na(x)))
> x.2
[[1]]
Run Length Encoding
  lengths: int [1:4] 2 2 1 2
  values : logi [1:4] FALSE  TRUE FALSE  TRUE

[[2]]
Run Length Encoding
  lengths: int [1:2] 5 2
  values : logi [1:2] FALSE  TRUE

[[3]]
Run Length Encoding
  lengths: int [1:2] 3 4
  values : logi [1:2]  TRUE FALSE

[[4]]
Run Length Encoding
  lengths: int [1:7] 1 1 1 1 1 1 1
  values : logi [1:7] FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE

> sapply(x.2,function(x)length(x$lengths)>2)
[1]  TRUE FALSE FALSE  TRUE
> # first and fourth cases have the sequence you want and both
> # start with 'data' because 'values' is FALSE


                                                                                                                 
                      Wolfgang                                                                                   
                      Viechtbauer                To:       <r-help at stat.math.ethz.ch>                            
                      <wviechtb at s.psych.u        cc:                                                             
                      iuc.edu>                   Subject:  [R] Finding Missing Data Patterns                     
                      Sent by:                                                                                   
                      r-help-admin at stat.m                                                                        
                      ath.ethz.ch                                                                                
                                                                                                                 
                                                                                                                 
                      02/02/2003 00:09                                                                           
                                                                                                                 
                                                                                                                 




Dear R-Helpers,

I have a large data matrix, which contains missing data. The matrix
looks something like this:

1) X  X  X  X  X  X NA NA NA
2) NA NA NA NA X  X  X  X  X
3) NA NA X  X  X  X NA NA NA
4) X  X  X  X  X  X  X  X  X
5) X  X  NA NA X NA NA NA NA

and so on. Notice that the first row starts with complete data but ends
with missing. The second row starts with missing, but the rest is
complete. The third starts and ends with missing, but the middle part is
complete. The fourth is complete. What I want to do is filter out
patterns like in row 5, where the data are interrupted by missing data.
Basically, I need to test each row for a "data, at least one NA, data"
pattern.

Is there some kind of way of doing this? I am at a loss for an easy way
to accomplishing this. Any suggestions would be most appreciated!

--
Wolfgang Viechtbauer

______________________________________________
R-help at stat.math.ethz.ch mailing list
http://www.stat.math.ethz.ch/mailman/listinfo/r-help




--
"NOTICE:  The information contained in this electronic mail transmission is
intended by Convergys Corporation for the use of the named individual or
entity to which it is directed and may contain information that is
privileged or otherwise confidential.  If you have received this electronic
mail transmission in error, please delete it from your system without
copying or forwarding it, and notify the sender of the error by reply email
or by telephone (collect), so that the sender's address records can be
corrected."




More information about the R-help mailing list