[R] when to use `which'?

William Dunlap wdunlap at tibco.com
Wed Jul 13 17:49:45 CEST 2011


x[which(condition)], like the subset function, treats NAs in
condition as FALSE and hence does not output NAs for them.
I was also surprised to see that it runs a trifle faster than x[condition]
in R 2.13.0 if there are few TRUEs in condition and a trifle slower
if there are many TRUEs.

A danger of the x[which(condition)] approach is the case
where you are trying to omit some entries by using a negative
integer subscript, as in
    x[-which(is.na(x))]
That is equivalent to
    x[!is.na(x)]
if there are any NAs in x but if there are no NAs in x then
its output is a zero-length vector.

For complicated conditions I find it easier understand code
using logical operators
    x[!is.na(x) & x>0 & x<10]
than code using set operators using the output of which
   x[intersect( setdiff( which(x>0), which(is.na(x))), which(x<10))]

Bill Dunlap
TIBCO Spotfire

________________________________________
From: r-help-bounces at r-project.org [r-help-bounces at r-project.org] on behalf of csrabak [crabak at acm.org]
Sent: Wednesday, July 13, 2011 6:20 AM
To: r-help at stat.math.ethz.ch
Subject: Re: [R] when to use `which'?

Em 12/7/2011 17:29, David Winsemius escreveu:
>
[snipped]

> If you have millions of records and tens of thousands of NA's (say ~ 1%
> of the data), imagine what your console looks like if you try to pick
> out records from one day and get 10,000 where you were expecting 100. A
> real PITA when you are doing real work.
>

I canvas this snippet of experience and wisdom to become a fortune :-)

--
Cesar Rabak

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


More information about the R-help mailing list