[R] Inexplicably different results using subset vs bracket notation on logical variable
David Winsemius
dwinsemius at comcast.net
Tue Aug 28 05:11:51 CEST 2012
On Aug 27, 2012, at 5:08 PM, Mauricio Cornejo wrote:
> Hi,
>
> Would anyone have any idea as to why I would obtain completely
> different results when subsetting using the subset function vs
> bracket notation?
>
> I have a data frame with 65 variables and 4382 rows. When I use
> execute the following subset command I get the correct results (125
> rows)
>> subset(df, Renewal==TRUE, 1:2)
>
>
> However, I tried to obtain the same results with bracket notation as
> follows. The output gave me all the rows in the data frame and not
> just the subset of 125 I was looking for.
>> df[df$Renewal==TRUE, 1:2]
>
> The 'Renewal' variable is of logical type and is the last (65th)
> variable in the data frame. However, values are either TRUE or NA
> (there are no 'FALSE' values).
That's exactly it. If a logical index returns NA, its row is included
in the output of "[" extraction. You can correct what I consider a
failing and others consider a feature with:
df[df$Renewal==TRUE & !is.na(df$Renewal), 1:2]
>
> My attempts at replicating this with a small dummy data set, for
> including here, have not worked (i.e. I don't get an error when I
> use synthetic data). Any ideas on what could be going on?
You _should_ get the predicted behavior. Perhaps your test case was
flawed?
> dat <- data.frame(test1=1, Renewal=as.logical( sample(c(0,1,NA),
20, repl=TRUE)))
> dat[dat$Renewal==TRUE, ]
test1 Renewal
NA NA NA
NA.1 NA NA
3 1 TRUE
NA.2 NA NA
NA.3 NA NA
6 1 TRUE
7 1 TRUE
8 1 TRUE
NA.4 NA NA
12 1 TRUE
NA.5 NA NA
NA.6 NA NA
16 1 TRUE
17 1 TRUE
NA.7 NA NA
NA.8 NA NA
This is all described in ?"["
--
David Winsemius, MD
Alameda, CA, USA
More information about the R-help
mailing list