[R] Quick question: Omitting rows and cols with certain percents of missing values

David Winsemius dwinsemius at comcast.net
Fri May 13 16:12:12 CEST 2011


On May 13, 2011, at 9:42 AM, Vickie S wrote:

>
> Hi
> naive question.
> It is possible to get R command for omitting rows or cols with  
> missing values present.
>
> But
> if i want to omit rows or cols with i.e . >20% missing values, I
> could´t find any package-based command, probably because it is too
> simple for anyone to do that manually, though not for me. Can anyone
> please help me ?

?is.na

 > str(fil)
'data.frame':	8 obs. of  5 variables:
  $ X1  : int  2 3 4 5 6 NA NA 6
  $ X5  : int  6 7 NA NA NA NA NA NA
  $ X8  : int  9 NA NA NA NA NA NA NA
  $ X   : logi  NA NA NA NA NA NA ...
  $ X1.1: Factor w/ 6 levels "","2","3","5",..: 2 3 1 4 5 6 1 1
 > is.na(fil)
         X1    X5    X8    X  X1.1
[1,] FALSE FALSE FALSE TRUE FALSE
[2,] FALSE FALSE  TRUE TRUE FALSE
[3,] FALSE  TRUE  TRUE TRUE FALSE
[4,] FALSE  TRUE  TRUE TRUE FALSE
[5,] FALSE  TRUE  TRUE TRUE FALSE
[6,]  TRUE  TRUE  TRUE TRUE FALSE
[7,]  TRUE  TRUE  TRUE TRUE FALSE
[8,] FALSE  TRUE  TRUE TRUE FALSE
 > str(is.na(fil))
  logi [1:8, 1:5] FALSE FALSE FALSE FALSE FALSE TRUE ...
  - attr(*, "dimnames")=List of 2
   ..$ : NULL
   ..$ : chr [1:5] "X1" "X5" "X8" "X" ...

So is.na() applied to a dataframe will return a logical matrix. You  
can run your tests for percentages with apply() using appropriate  
margin arguments to generate logical indices for selection of rows or  
columns.

-- 
David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list