[R] A query about na.omit
Bernardo Rangel Tura
tura at centroin.com.br
Wed Apr 1 21:19:04 CEST 2009
On Wed, 2009-04-01 at 16:49 +0100, Jose Iparraguirre D'Elia wrote:
> Dear all,
>
> Say I have the following dataset:
>
> > DF
> x y z
> [1] 1 1 1
> [2] 2 2 2
> [3] 3 3 NA
> [4] 4 NA 4
> [5] NA 5 5
>
> And I want to omit all the rows which have NA, but only in columns X and Y, so that I get:
>
> x y z
> 1 1 1
> 2 2 2
> 3 3 NA
>
> If I use na.omit(DF), I would delete the row for which z=NA, obtaining thus
>
> x y z
> 1 1 1
> 2 2 2
>
> But this is not what I want, of course.
> If I use na.omit(DF[,1:2]), then I obtain
>
> x y
> 1 1
> 2 2
> 3 3
>
> which is OK for x and y columns, but I wouldn't get the corresponding values for z (ie 1 2 NA)
>
> Any suggestions about how to obtain the desired results efficiently (the actual dataset has millions of records and almost 50 columns, and I would apply the procedure on 12 of these columns)?
>
> Sincerely,
>
> Jose Luis
>
> Jose Luis Iparraguirre
> Senior Research Economist
> Economic Research Institute of Northern Ireland
>
Hi Jose Luis,
I think this script is sufficient for your problem:
tab<-matrix(c(1,1,1,2,2,2,3,3,NA,4,NA,4,NA,5,5),ncol=3,byrow=T)
tab[!is.na(tab[,1])&!is.na(tab[,2]),]
--
Bernardo Rangel Tura, M.D,MPH,Ph.D
National Institute of Cardiology
Brazil
More information about the R-help
mailing list