[R] Quick question: Omitting rows and cols with certain percents of missing values
David Winsemius
dwinsemius at comcast.net
Fri May 13 16:12:12 CEST 2011
On May 13, 2011, at 9:42 AM, Vickie S wrote:
>
> Hi
> naive question.
> It is possible to get R command for omitting rows or cols with
> missing values present.
>
> But
> if i want to omit rows or cols with i.e . >20% missing values, I
> could´t find any package-based command, probably because it is too
> simple for anyone to do that manually, though not for me. Can anyone
> please help me ?
?is.na
> str(fil)
'data.frame': 8 obs. of 5 variables:
$ X1 : int 2 3 4 5 6 NA NA 6
$ X5 : int 6 7 NA NA NA NA NA NA
$ X8 : int 9 NA NA NA NA NA NA NA
$ X : logi NA NA NA NA NA NA ...
$ X1.1: Factor w/ 6 levels "","2","3","5",..: 2 3 1 4 5 6 1 1
> is.na(fil)
X1 X5 X8 X X1.1
[1,] FALSE FALSE FALSE TRUE FALSE
[2,] FALSE FALSE TRUE TRUE FALSE
[3,] FALSE TRUE TRUE TRUE FALSE
[4,] FALSE TRUE TRUE TRUE FALSE
[5,] FALSE TRUE TRUE TRUE FALSE
[6,] TRUE TRUE TRUE TRUE FALSE
[7,] TRUE TRUE TRUE TRUE FALSE
[8,] FALSE TRUE TRUE TRUE FALSE
> str(is.na(fil))
logi [1:8, 1:5] FALSE FALSE FALSE FALSE FALSE TRUE ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:5] "X1" "X5" "X8" "X" ...
So is.na() applied to a dataframe will return a logical matrix. You
can run your tests for percentages with apply() using appropriate
margin arguments to generate logical indices for selection of rows or
columns.
--
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list