[R] Why does R replace all row values with NAs

Dimitri Liakhovitski dimitri.liakhovitski at gmail.com
Fri Feb 27 17:22:59 CET 2015


Thank you very much guys!

On Fri, Feb 27, 2015 at 11:04 AM, William Dunlap <wdunlap at tibco.com> wrote:
> You could define functions like
>    is.true <- function(x) !is.na(x) & x
>    is.false <- function(x) !is.na(x) & !x
> and use them in your selections.  E.g.,
>   > x <- data.frame(a=1:10,b=2:11,c=c(1,NA,3,NA,5,NA,7,NA,NA,10))
>   > x[is.true(x$c >= 6), ]
>       a  b  c
>   7   7  8  7
>   10 10 11 10
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Fri, Feb 27, 2015 at 7:27 AM, Dimitri Liakhovitski
> <dimitri.liakhovitski at gmail.com> wrote:
>>
>> Thank you very much, Duncan.
>> All this being said:
>>
>> What would you say is the most elegant and most safe way to solve such
>> a seemingly simple task?
>>
>> Thank you!
>>
>> On Fri, Feb 27, 2015 at 10:02 AM, Duncan Murdoch
>> <murdoch.duncan at gmail.com> wrote:
>> > On 27/02/2015 9:49 AM, Dimitri Liakhovitski wrote:
>> >> So, Duncan, do I understand you correctly:
>> >>
>> >> When I use x$x<6, R doesn't know if it's TRUE or FALSE, so it returns
>> >> a logical value of NA.
>> >
>> > Yes, when x$x is NA.  (Though I think you meant x$c.)
>> >
>> >> When this logical value is applied to a row, the R says: hell, I don't
>> >> know if I should keep it or not, so, just in case, I am going to keep
>> >> it, but I'll replace all the values in this row with NAs?
>> >
>> > Yes.  Indexing with a logical NA is probably a mistake, and this is one
>> > way to signal it without actually triggering a warning or error.
>> >
>> > BTW, I should have mentioned that the example where you indexed using
>> > -which(x$c>=6) is a bad idea:  if none of the entries were 6 or more,
>> > this would be indexing with an empty vector, and you'd get nothing, not
>> > everything.
>> >
>> > Duncan Murdoch
>> >
>> >
>> >>
>> >> On Fri, Feb 27, 2015 at 9:13 AM, Duncan Murdoch
>> >> <murdoch.duncan at gmail.com> wrote:
>> >>> On 27/02/2015 9:04 AM, Dimitri Liakhovitski wrote:
>> >>>> I know how to get the output I need, but I would benefit from an
>> >>>> explanation why R behaves the way it does.
>> >>>>
>> >>>> # I have a data frame x:
>> >>>> x = data.frame(a=1:10,b=2:11,c=c(1,NA,3,NA,5,NA,7,NA,NA,10))
>> >>>> x
>> >>>> # I want to toss rows in x that contain values >=6. But I don't want
>> >>>> to toss my NAs there.
>> >>>>
>> >>>> subset(x,c<6) # Works correctly, but removes NAs in c, understand why
>> >>>> x[which(x$c<6),] # Works correctly, but removes NAs in c, understand
>> >>>> why
>> >>>> x[-which(x$c>=6),] # output I need
>> >>>>
>> >>>> # Here is my question: why does the following line replace the values
>> >>>> of all rows that contain an NA # in x$c with NAs?
>> >>>>
>> >>>> x[x$c<6,]  # Leaves rows with c=NA, but makes the whole row an NA.
>> >>>> Why???
>> >>>> x[(x$c<6) | is.na(x$c),] # output I need - I have to be
>> >>>> super-explicit
>> >>>>
>> >>>> Thank you very much!
>> >>>
>> >>> Most of your examples (except the ones using which()) are doing
>> >>> logical
>> >>> indexing.  In logical indexing, TRUE keeps a line, FALSE drops the
>> >>> line,
>> >>> and NA returns NA.  Since "x$c < 6" is NA if x$c is NA, you get the
>> >>> third kind of indexing.
>> >>>
>> >>> Your last example works because in the cases where x$c is NA, it
>> >>> evaluates NA | TRUE, and that evaluates to TRUE.  In the cases where
>> >>> x$c
>> >>> is not NA, you get x$c < 6 | FALSE, and that's the same as x$c < 6,
>> >>> which will be either TRUE or FALSE.
>> >>>
>> >>> Duncan Murdoch
>> >>>
>> >>
>> >>
>> >>
>> >
>>
>>
>>
>> --
>> Dimitri Liakhovitski
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>



-- 
Dimitri Liakhovitski



More information about the R-help mailing list