[Rd] often unnecessary duplicate in sapply / as.vector

Tue Jul 11 17:25:39 CEST 2006

On Tue, 11 Jul 2006, Thomas Lumley wrote:

> On Tue, 11 Jul 2006, Prof Brian Ripley wrote:
> 
> > On Fri, 7 Jul 2006, Thomas Lumley wrote:
> >
> > > On Fri, 7 Jul 2006, Martin Morgan wrote:
> > >
> > > > sapply calls lapply as
> > > >
> > > >    answer <- lapply(as.list(X), FUN, ...)
> > > >
> > > > which, when X is a list, causes X to be duplicated unnecessarily. The
> > > > coercion is unnecessary for other mode(X) because in lapply we have
> > > >
> > > >    if (!is.list(X)) X <- as.list(X)
> > >
> > > That looks reasonable.
> >
> > And you have made the change.  Unfortunately it is not really reasonable,
> > as is.list(X) does not test that X is a list (see its documentation) in
> > the same sense as as.list, so pairlists are now passed to the internal
> > code.
> 
> Where do we still get pairlists in interpreted code? I thought they had all
> been hidden.

Not quite all.  You can use pairlist() to create them, and .Options is one 
(fairly long) example.  (I used pairlist to create a very slow example.)

> > There's something rather undesirable going on here.  The internal code for
> > lapply (in its current version, not the one I wrote) does the internal
> > equivalent of
> >
> >    rval <- vector("list", length(X))
> >    for(i in seq(along = X))
> >        rval[i] <- list(FUN(X[[i]], ...))
> >
> > from the earlier
> >
> > lapply <- function(X, FUN, ...) {
> >    FUN <- match.fun(FUN)
> >    if (!is.list(X))
> >        X <- as.list(X)
> >    rval <- vector("list", length(X))
> >    for(i in seq(along = X))
> >        rval[i] <- list(FUN(X[[i]], ...))
> >    names(rval) <- names(X)               # keep `names' !
> >    return(rval)
> > }
> >
> > so all that is needed is that X[[i]] work.
> >
> > For a pairlist [[i]] done repeatedly is very inefficient (since it starts
> > at the beginning each time), so we *do* want to coerce pairlists here.
> 
> Or have a separate loop using CDR and CAR rather than [[, which would mean not
> having to copy X.

If we are going there we should also special-case all the (much more 
common) vector types thereby avoiding [[, which I have so far resisted.

> > On the other hand, we do not need to coerce expressions or atomic vectors
> > for which [[]] works just fine.
> 
> Indeed.

I've just committed a version that is a lot faster, fast enough to shave 
5% off the total time for both the stats and boot examples.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595