[R] which() vs. just logical selection in df
Greg Snow
538280 @end|ng |rom gm@||@com
Mon Oct 12 21:01:36 CEST 2020
I would suggest using the microbenchmark package to do the time
comparison. This will run each a bunch of times for a more meaningful
comparison.
One possible reason for the difference is the number of missing values
in your data (along with the number of columns). Consider the
difference in the following results:
> x <- c(1,2,NA)
> x[x==1]
[1] 1 NA
> x[which(x==1)]
[1] 1
On Sat, Oct 10, 2020 at 5:25 PM 1/k^c <kchamberln using gmail.com> wrote:
>
> Hi R-helpers,
>
> Does anyone know why adding which() makes the select call more
> efficient than just using logical selection in a dataframe? Doesn't
> which() technically add another conversion/function call on top of the
> logical selection? Here is a reproducible example with a slight
> difference in timing.
>
> # Surrogate data - the timing here isn't interesting
> urltext <- paste("https://drive.google.com/",
> "uc?id=1AZ-s1EgZXs4M_XF3YYEaKjjMMvRQ7",
> "-h8&export=download", sep="")
> download.file(url=urltext, destfile="tempfile.csv") # download file first
> dat <- read.csv("tempfile.csv", stringsAsFactors = FALSE, header=TRUE,
> nrows=2.5e6) # read the file; 'nrows' is a slight
> # overestimate
> dat <- dat[,1:3] # select just the first 3 columns
> head(dat, 10) # print the first 10 rows
>
> # Select using which() as the final step ~ 90ms total time on my macbook air
> system.time(
> head(
> dat[which(dat$gender2=="other"),],),
> gcFirst=TRUE)
>
> # Select skipping which() ~130ms total time
> system.time(
> head(
> dat[dat$gender2=="other", ]),
> gcFirst=TRUE)
>
> Now I would think that the second one without which() would be more
> efficient. However, every time I run these, the first version, with
> which() is more efficient by about 20ms of system time and 20ms of
> user time. Does anyone know why this is?
>
> Cheers!
> Keith
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Gregory (Greg) L. Snow Ph.D.
538280 using gmail.com
More information about the R-help
mailing list