[R] Why does R replace all row values with NAs

Duncan Murdoch murdoch.duncan at gmail.com
Fri Feb 27 17:00:46 CET 2015


On 27/02/2015 10:27 AM, Dimitri Liakhovitski wrote:
> Thank you very much, Duncan.
> All this being said:
> 
> What would you say is the most elegant and most safe way to solve such
> a seemingly simple task?

If you have NA values, test for them explicitly, e.g. your original

x[(x$c<6) | is.na(x$c),]

I would write it as

x[is.na(x$c) | x$c < 6,]

but that's purely a style difference, I don't think it would affect
execution time (or results).  I like to put the weird case first because
it will remind me that things are more complicated than you might guess.

Duncan Murdoch

> 
> Thank you!
> 
> On Fri, Feb 27, 2015 at 10:02 AM, Duncan Murdoch
> <murdoch.duncan at gmail.com> wrote:
>> On 27/02/2015 9:49 AM, Dimitri Liakhovitski wrote:
>>> So, Duncan, do I understand you correctly:
>>>
>>> When I use x$x<6, R doesn't know if it's TRUE or FALSE, so it returns
>>> a logical value of NA.
>>
>> Yes, when x$x is NA.  (Though I think you meant x$c.)
>>
>>> When this logical value is applied to a row, the R says: hell, I don't
>>> know if I should keep it or not, so, just in case, I am going to keep
>>> it, but I'll replace all the values in this row with NAs?
>>
>> Yes.  Indexing with a logical NA is probably a mistake, and this is one
>> way to signal it without actually triggering a warning or error.
>>
>> BTW, I should have mentioned that the example where you indexed using
>> -which(x$c>=6) is a bad idea:  if none of the entries were 6 or more,
>> this would be indexing with an empty vector, and you'd get nothing, not
>> everything.
>>
>> Duncan Murdoch
>>
>>
>>>
>>> On Fri, Feb 27, 2015 at 9:13 AM, Duncan Murdoch
>>> <murdoch.duncan at gmail.com> wrote:
>>>> On 27/02/2015 9:04 AM, Dimitri Liakhovitski wrote:
>>>>> I know how to get the output I need, but I would benefit from an
>>>>> explanation why R behaves the way it does.
>>>>>
>>>>> # I have a data frame x:
>>>>> x = data.frame(a=1:10,b=2:11,c=c(1,NA,3,NA,5,NA,7,NA,NA,10))
>>>>> x
>>>>> # I want to toss rows in x that contain values >=6. But I don't want
>>>>> to toss my NAs there.
>>>>>
>>>>> subset(x,c<6) # Works correctly, but removes NAs in c, understand why
>>>>> x[which(x$c<6),] # Works correctly, but removes NAs in c, understand why
>>>>> x[-which(x$c>=6),] # output I need
>>>>>
>>>>> # Here is my question: why does the following line replace the values
>>>>> of all rows that contain an NA # in x$c with NAs?
>>>>>
>>>>> x[x$c<6,]  # Leaves rows with c=NA, but makes the whole row an NA. Why???
>>>>> x[(x$c<6) | is.na(x$c),] # output I need - I have to be super-explicit
>>>>>
>>>>> Thank you very much!
>>>>
>>>> Most of your examples (except the ones using which()) are doing logical
>>>> indexing.  In logical indexing, TRUE keeps a line, FALSE drops the line,
>>>> and NA returns NA.  Since "x$c < 6" is NA if x$c is NA, you get the
>>>> third kind of indexing.
>>>>
>>>> Your last example works because in the cases where x$c is NA, it
>>>> evaluates NA | TRUE, and that evaluates to TRUE.  In the cases where x$c
>>>> is not NA, you get x$c < 6 | FALSE, and that's the same as x$c < 6,
>>>> which will be either TRUE or FALSE.
>>>>
>>>> Duncan Murdoch
>>>>
>>>
>>>
>>>
>>
> 
> 
>



More information about the R-help mailing list