[R] categorized complete list of R commands?

R. Michael Weylandt michael.weylandt at gmail.com
Fri Apr 5 03:10:14 CEST 2013


On Thu, Apr 4, 2013 at 1:54 PM, ivo welch <ivo.welch at gmail.com> wrote:
>
> ## must be started with R --vanilla
> all.sources <- search()
> d <- NULL
> for (i in 1:length(all.sources)) {
>   all.functions <- ls(search()[i])
>   N <- length(all.functions)
>   if (N==0) next
>   d <- rbind(d, data.frame( src=rep(all.sources[i], N), index=1:N,
> fname=all.functions ) )
> }
>

Allowing myself to get slightly off-topic, this is really rough code
and commits all sorts of performance / style sins.

The big one, however, is the strange use of iteration: what is the
point of all.sources? Do we need to run search() so many times? We
never actually need the index, only the values, so why not iterate on
those. The same code could be written like this:

d <- NULL
for(pkg in search()){
   all.functions <- ls(pkg)
   N <- length(all.functions)
   if(N == 0) next

   d <- rbind(d, data.frame(src = rep(pkg, N), index = 1:N, fname =
all.functions)
}

For more on why / how this is better, check out the talk "Loop like a
Native" from the recent PyCon.

But wait, there's more! repeated "rbinding" is not great for
performance. Better to collect all the results in a list and then put
them together in one step:

base_packages <- search()
L <- vector("list", length(base_packages))

for(pkg_no in seq_along(back_packages)){
    pkg <- base_packages[pkg_no]
    all.functions <- ls(pkg)

    N <- length(all.functions)
    if(N == 0) next

   L[[pkg_no]] <-  data.frame(src = rep(pkg, N), index = 1:N, fname =
all.functions)
}

do.call(rbind, L)

But now we're back to the almost unnecessary loop variable: we can
kill it by using named lookups on L.

base_packages <- search()
L <- vector("list", length(base_packages))
names(L) <- base_packages

for(pkg in back_packages){
    all.functions <- ls(pkg)

    N <- length(all.functions)
    if(N == 0) next

   L[[pkg]] <-  data.frame(src = rep(pkg, N), index = 1:N, fname =
all.functions)
}

do.call(rbind, L)

But do we really want to handle the set up ourselves? Probably best to
let R take care of it for us. If we can abstract our loop into a
function we can lapply() it. The insight is that we are taking the
output of search() as a list and doing our operation to it, so let's
try:

do.call(rbind, lapply(search(), function(pkg){
    all.functions <- ls(pkg)

    N <- length(all.functions)
    if(N == 0) NULL else data.frame(src = rep(pkg, N), index = 1:N,
fname = all.functions)
})

Note that we don't assign, but simply return from the anonymous
function in this case. I've also passed straight to rbind() just to
make it quick.

And, as Bert noted, it'd be best to actually filter on functions, but
let's leave that as an exercise to the reader for now.

None of this code is tested, but hopefully it's more efficient and idiomatic.

Cheers,
Michael



More information about the R-help mailing list