[Rd] often unnecessary duplicate in sapply / as.vector
Prof Brian Ripley
ripley at stats.ox.ac.uk
Tue Jul 11 14:20:17 CEST 2006
On Fri, 7 Jul 2006, Thomas Lumley wrote:
> On Fri, 7 Jul 2006, Martin Morgan wrote:
>
> > sapply calls lapply as
> >
> > answer <- lapply(as.list(X), FUN, ...)
> >
> > which, when X is a list, causes X to be duplicated unnecessarily. The
> > coercion is unnecessary for other mode(X) because in lapply we have
> >
> > if (!is.list(X)) X <- as.list(X)
>
> That looks reasonable.
And you have made the change. Unfortunately it is not really reasonable,
as is.list(X) does not test that X is a list (see its documentation) in
the same sense as as.list, so pairlists are now passed to the internal
code.
There's something rather undesirable going on here. The internal code for
lapply (in its current version, not the one I wrote) does the internal
equivalent of
rval <- vector("list", length(X))
for(i in seq(along = X))
rval[i] <- list(FUN(X[[i]], ...))
from the earlier
lapply <- function(X, FUN, ...) {
FUN <- match.fun(FUN)
if (!is.list(X))
X <- as.list(X)
rval <- vector("list", length(X))
for(i in seq(along = X))
rval[i] <- list(FUN(X[[i]], ...))
names(rval) <- names(X) # keep `names' !
return(rval)
}
so all that is needed is that X[[i]] work.
For a pairlist [[i]] done repeatedly is very inefficient (since it starts
at the beginning each time), so we *do* want to coerce pairlists here.
On the other hand, we do not need to coerce expressions or atomic vectors
for which [[]] works just fine.
> > More generally, perhaps as.vector might not duplicate when mode(x) == mode ?
>
> This isn't a trivial change, because mode(x)==mode does not guarantee
> that as.vector(x, mode) has no effect. For example, with mode="numeric" it
> removes attributes.
And with mode="list" it does not (although that is not as documented).
We can certainly do better. [This is another of those cases where 'mode'
is confusing, and in fact it would be typeof(x) == mode.]
However, for now let us concentrate on as.list.default, which does
as.list.default <- function (x, ...)
{
if (is.function(x))
return(c(formals(x), list(body(x))))
if (is.expression(x)) {
n <- length(x)
l <- vector("list", n)
i <- 0
for (sub in x) l[[i <- i + 1]] <- sub
return(l)
}
.Internal(as.vector(x, "list"))
}
That's a bit strange, as an expression is internally a list, and it loses
the names on the expression. I intend to make as.list(x) return
x unchanged if x is a list (not a pairlist), and to coerce expressions
internally.
After that I will think about making as.vector and lapply make fewer
copies.
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-devel
mailing list