[R] Why does R replace all row values with NAs
Duncan Murdoch
murdoch.duncan at gmail.com
Fri Feb 27 15:13:45 CET 2015
On 27/02/2015 9:04 AM, Dimitri Liakhovitski wrote:
> I know how to get the output I need, but I would benefit from an
> explanation why R behaves the way it does.
>
> # I have a data frame x:
> x = data.frame(a=1:10,b=2:11,c=c(1,NA,3,NA,5,NA,7,NA,NA,10))
> x
> # I want to toss rows in x that contain values >=6. But I don't want
> to toss my NAs there.
>
> subset(x,c<6) # Works correctly, but removes NAs in c, understand why
> x[which(x$c<6),] # Works correctly, but removes NAs in c, understand why
> x[-which(x$c>=6),] # output I need
>
> # Here is my question: why does the following line replace the values
> of all rows that contain an NA # in x$c with NAs?
>
> x[x$c<6,] # Leaves rows with c=NA, but makes the whole row an NA. Why???
> x[(x$c<6) | is.na(x$c),] # output I need - I have to be super-explicit
>
> Thank you very much!
Most of your examples (except the ones using which()) are doing logical
indexing. In logical indexing, TRUE keeps a line, FALSE drops the line,
and NA returns NA. Since "x$c < 6" is NA if x$c is NA, you get the
third kind of indexing.
Your last example works because in the cases where x$c is NA, it
evaluates NA | TRUE, and that evaluates to TRUE. In the cases where x$c
is not NA, you get x$c < 6 | FALSE, and that's the same as x$c < 6,
which will be either TRUE or FALSE.
Duncan Murdoch
More information about the R-help
mailing list