[R] A query about na.omit
(Ted Harding)
Ted.Harding at manchester.ac.uk
Wed Apr 1 19:00:55 CEST 2009
On 01-Apr-09 15:49:40, Jose Iparraguirre D'Elia wrote:
> Dear all,
> Say I have the following dataset:
>
>> DF
> x y z
> [1] 1 1 1
> [2] 2 2 2
> [3] 3 3 NA
> [4] 4 NA 4
> [5] NA 5 5
>
> And I want to omit all the rows which have NA, but only in columns X
> and Y, so that I get:
>
> x y z
> 1 1 1
> 2 2 2
> 3 3 NA
Roll up your sleeves, and spell out in detail the condition you need:
DF<-data.frame(x=c(1,2,3,4,NA),y=c(1,2,3,NA,5),z=c(1,2,NA,4,5))
DF
# x y z
# 1 1 1 1
# 2 2 2 2
# 3 3 3 NA
# 4 4 NA 4
# 5 NA 5 5
DF[!(is.na(rowSums(DF[,(1:2)]))),]
# x y z
# 1 1 1 1
# 2 2 2 2
# 3 3 3 NA
Hoping this helps,
Ted.
> If I use na.omit(DF), I would delete the row for which z=NA, obtaining
> thus
>
> x y z
> 1 1 1
> 2 2 2
>
> But this is not what I want, of course.
> If I use na.omit(DF[,1:2]), then I obtain
>
> x y
> 1 1
> 2 2
> 3 3
>
> which is OK for x and y columns, but I wouldn't get the corresponding
> values for z (ie 1 2 NA)
>
> Any suggestions about how to obtain the desired results efficiently
> (the actual dataset has millions of records and almost 50 columns, and
> I would apply the procedure on 12 of these columns)?
>
> Sincerely,
>
> Jose Luis
>
> Jose Luis Iparraguirre
> Senior Research Economist
> Economic Research Institute of Northern Ireland
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 01-Apr-09 Time: 18:00:53
------------------------------ XFMail ------------------------------
More information about the R-help
mailing list