[R] Using and abusing %>% (was Re: Why can't I access this type?)
Hadley Wickham
h.wickham at gmail.com
Sat Mar 28 05:40:21 CET 2015
> I didn't dispute whether '%>%' may be useful -- I just pointed out that it
> is slow. However, it is only part of the problem: 'filter()' and
> 'select()', although aesthetically pleasing, also seem to be slow:
>
>> all.states <- data.frame(state.x77, Name = rownames(state.x77))
>>
>> f1 <- function()
> + all.states[all.states$Frost > 150, c("Name", "Frost")]
>>
>> f2 <- function()
> + subset(all.states, Frost > 150, select = c("Name", "Frost"))
>>
>> f3 <- function() {
> + filt <- subset(all.states, Frost > 150)
> + subset(filt, select = c("Name", "Frost"))
> + }
>>
>> f4 <- function()
> + all.states %>% subset(Frost > 150) %>%
> + subset(select = c("Name", "Frost"))
>>
>> f5 <- function()
> + select(filter(all.states, Frost > 150), Name, Frost)
>>
>> f6 <- function()
> + all.states %>% filter(Frost > 150) %>% select(Name, Frost)
>>
>> mb <- microbenchmark(
> + f1(), f2(), f3(), f4(), f5(), f6(),
> + times = 1000L
> + )
>> print(mb, signif = 3L)
> Unit: microseconds
> expr min lq mean median uq max neval cld
> f1() 115 124 134.8812 129 134 1500 1000 a
> f2() 128 141 147.4694 145 151 1520 1000 a
> f3() 303 328 344.3175 338 348 1740 1000 b
> f4() 458 494 518.0830 510 523 1890 1000 c
> f5() 806 848 887.7270 875 894 3510 1000 d
> f6() 971 1010 1056.5659 1040 1060 3110 1000 e
>
> So, using '%>%', but leaving 'filter()' and 'select()' out of the equation,
> as in 'f4()' is only half as bad as the "full" 'dplyr' idiom in 'f6()'. In
> this case, since we're talking microseconds, the speed-up is negligible but
> that *is* beside the point.
When benchmarking it's important to consider both the relative and
absolute difference and to think about how the cost scales as the data
grows - the cost of using using %>% is fixed, and 500 µs doesn't seem
like a huge performance penalty to me.
Hadley
--
http://had.co.nz/
More information about the R-help
mailing list