[Rd] Speed difference between df$a[1] and df[1,"a"]

Allan Engelhardt allane at cybaea.com
Thu Oct 20 22:58:43 CEST 2011


`$` and `[` are primitives while `[.data.frame` is a longish R function 
that does all sorts of clever things.

On 19/10/11 22:34, Stavros Macrakis wrote:
> I was surprised to find that df$a[1] is an order of magnitude faster than
> df[1,"a"]:
>
>> df<- data.frame(a=1:10)
>> system.time(replicate(100000, df$a[3]))
>     user  system elapsed
>     0.36    0.00    0.36
>
>> system.time(replicate(100000, df[3,"a"]))
>     user  system elapsed
>     4.09    0.00    4.09
>
>
> A priori, I'd have thought that combining the row and column selections into
> a single operation would at worst be equally fast, at best would be faster
> by having fewer intermediate results and avoiding redundant operations.
>
> I thought this might be because df[,] builds a data frame before simplifying
> it to a vector, but with drop=F, it is even slower, so that doesn't seem to
> be the problem:
>
>> system.time(replicate(100000, df[3,"a",drop=FALSE]))
>     user  system elapsed
>    15.00    0.00   14.99
>
>
> I then wondered if it might be because '[' allows multiple columns and
> handles rownames. Sure enough, '[[,]]', which allows only one column, and
> does not handle rownames, is almost 3x faster:
>
>> system.time(replicate(100000, df[[3,"a"]]))
>     user  system elapsed
>     1.48    0.00    1.48
>
>
> ...but it is still 4x slower than $[].
>
> The timings are not sensitive to the number of rows in df (except for the
> drop=FALSE case, which is much slower for large dfs).  I will be avoiding
> [,] and [[,]] when I don't need their functionality, but I still wonder why
> they should be so much slower than $[].
>
>              -s
>
> R 2.13.1 on Windows 7, i7-860 (2.8GHz) 8GB RAM
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list