[Rd] Speed difference between df$a[1] and df[1,"a"]

Fri Oct 21 07:23:24 CEST 2011

On Wed, Oct 19, 2011 at 2:34 PM, Stavros Macrakis <macrakis at alum.mit.edu> wrote:
> I was surprised to find that df$a[1] is an order of magnitude faster than
> df[1,"a"]:

Yes.  This treats a data frame as a list, and is much faster.

> I thought this might be because df[,] builds a data frame before simplifying
> it to a vector, but with drop=F, it is even slower, so that doesn't seem to
> be the problem:

drop=FALSE creates a data frame first, and then simplifies it to a
vector, so this test isn't showing what you think it is.

> I then wondered if it might be because '[' allows multiple columns and
> handles rownames. Sure enough, '[[,]]', which allows only one column, and
> does not handle rownames, is almost 3x faster:

That's part of it, but if you look at [.data.frame you see there is
also quite a bit of copying that could be avoided in simple cases but
is hard to avoid in full generality.

    -thomas

-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland