[R] Inexplicably different results using subset vs bracket notation on logical variable

Tue Aug 28 05:11:51 CEST 2012

On Aug 27, 2012, at 5:08 PM, Mauricio Cornejo wrote:

> Hi,
>
> Would anyone have any idea as to why I would obtain completely  
> different results when subsetting using the subset function vs  
> bracket notation?
>
> I have a data frame with 65 variables and 4382 rows. When I use  
> execute the following subset command I get the correct results (125  
> rows)
>> subset(df, Renewal==TRUE, 1:2)
>
>
> However, I tried to obtain the same results with bracket notation as  
> follows.  The output gave me all the rows in the data frame and not  
> just the subset of 125 I was looking for.
>> df[df$Renewal==TRUE, 1:2]
>
> The 'Renewal' variable is of logical type and is the last (65th)  
> variable in the data frame.  However, values are either TRUE or NA  
> (there are no 'FALSE' values).

That's exactly it. If a logical index returns NA, its row is included  
in the output of "[" extraction. You can correct what I consider a  
failing and others consider a feature with:

df[df$Renewal==TRUE & !is.na(df$Renewal), 1:2]

>
> My attempts at replicating this with a small dummy data set, for  
> including here, have not worked (i.e. I don't get an error when I  
> use synthetic data).  Any ideas on what could be going on?

You _should_ get the predicted behavior. Perhaps your test case was  
flawed?

 > dat <- data.frame(test1=1, Renewal=as.logical( sample(c(0,1,NA),  
20, repl=TRUE)))
 > dat[dat$Renewal==TRUE, ]
      test1 Renewal
NA      NA      NA
NA.1    NA      NA
3        1    TRUE
NA.2    NA      NA
NA.3    NA      NA
6        1    TRUE
7        1    TRUE
8        1    TRUE
NA.4    NA      NA
12       1    TRUE
NA.5    NA      NA
NA.6    NA      NA
16       1    TRUE
17       1    TRUE
NA.7    NA      NA
NA.8    NA      NA

This is all described in ?"["

-- 

David Winsemius, MD
Alameda, CA, USA