[R] lapply (and friends) with data.frames are slow

David Winsemius dwinsemius at comcast.net
Sat Jan 5 22:18:21 CET 2013


On Jan 5, 2013, at 11:38 AM, Kevin Ushey wrote:

> Hey guys,
>
> I noticed something curious in the lapply call. I'll copy+paste the
> function call here because it's short enough:
>
> lapply <- function (X, FUN, ...)
> {
>    FUN <- match.fun(FUN)
>    if (!is.vector(X) || is.object(X))
>        X <- as.list(X)
>    .Internal(lapply(X, FUN))
> }
>
> Notice that lapply coerces X to a list if the !is.vector ||  
> is.object(X)
> check passes.
>
> Curiously, data.frames fail the test (is.vector(data.frame()) returns
> FALSE); but it seems that coercion of a data.frame
> to a list would be unnecessary for the *apply family of functions.
>
> Is there a reason why we must coerce data.frames to list for these
> functions? I thought data.frames were essentially just 'structured  
> lists'?
>
> I ask because it is generally quite slow coercing a (large)  
> data.frame to a
> list, and it seems like this could be avoided for data.frames.

Is this related to this SO question that uses the microbenchmark  
function to illustrate the costs of the (possibly) superfluous coercion?

http://stackoverflow.com/questions/14169818/why-is-sapply-relatively-slow-when-querying-attributes-on-variables-in-a-data-fr

-- 

David Winsemius, MD
Alameda, CA, USA




More information about the R-help mailing list