[R] sometimes removing NAs from code

Wed Oct 26 17:50:06 CEST 2011

Hi,

On Wed, Oct 26, 2011 at 11:25 AM, Schatzi <adele_thompson at cargill.com> wrote:
> Sometimes I have NA values within specific columns of a dataframe (in this
> example, the first two columns can have NAs). If there are NA values, I
> would like them to be removed.
>
> I have been using the code:
>
> y<-c(NA,5,4,2,5,6,NA)
> z<-c(NA,3,4,NA,1,3,7)
> x<-1:7
> adata<-data.frame(y,z,x)
> adata<-adata[-which(apply(adata[,1:2],1,function(x)any(is.na(x)))),]
>
> This works well if there are NA values, but when a dataset doesn't have NA
> values, this code messes up the dataframe. I was trying to pick apart this
> code and could not understand why it didn't work when there were no NA
> values.

Thanks for the example. Your problem is because of the which() statement.

If there are NA values, which() returns the row numbers where the NAs are:

> which(apply(adata[,1:2],1,function(x)any(is.na(x))))
[1] 1 4 7

> bdata <- data.frame(1:7, 1:7, 1:7)
> which(apply(bdata[,1:2],1,function(x)any(is.na(x))))
integer(0)

But if there aren't any, which() returns 0. How does R subset on a row
index of 0?
Unhelpfully.

Fortunately you don't need the which() at all: the logical vector
returned by your
apply statement is entirely sufficient (with added negation):

> adata[apply(adata[,1:2],1,function(x)!any(is.na(x))), ]
  y z x
2 5 3 2
3 4 4 3
5 5 1 5
6 6 3 6
> bdata[apply(bdata[,1:2],1,function(x)!any(is.na(x))), ]
  X1.7 X1.7.1 X1.7.2
1    1      1      1
2    2      2      2
3    3      3      3
4    4      4      4
5    5      5      5
6    6      6      6
7    7      7      7

Sarah

>
> If there are no NA values and I run just the part:
> apply(adata[,1:2],1,function(x)any(is.na(x)))
> it results in:
>    2     3     5     6
> FALSE FALSE FALSE FALSE
>
> I was thinking that I can put in an if statement, but I think there has to
> be a better way.
>
> Any ideas/help? Thank you.
>

-- 
Sarah Goslee
http://www.functionaldiversity.org