[R] subscripts in lists

Tue Aug 12 01:01:50 CEST 2003

I suggested
	    sapply(1:length(lis), function (i) {v <- lis[[i]]; v[which(v=="next")+1]})

Of course that was really dumb.  It can be simplified, because the index i
is only used to select a list element, which sapply() wants to do for me
anyway.  It should be

    sapply(lis, function(v) v[which(v=="next")+1])

Perhaps the interesting thing is how one gets there.
- The result should be a character vector, not a list, so use sapply()
- The index of a list element does not enter into the calculation of
  the result, so use sapply(a.list, function (an.element) some.calculation)
- For list element, we want to find where something occurs, so use
  which(the.element == the.value.we.want.to.find)
- We want the element after that, so the.element[..... + 1]
and the code (NOT the code I first thought of) practically writes itself.

If I had used backwards reasoning like this, I'd have got there first thing;
what led me to produce an inferior version was using forwards reasoning,
and I *know* better than to do that.  *Sigh.*

The other approach is not to focus on the list structure at all,
but to flatten it into a single sequence:

    {u <- unlist(lis); u[which(u=="next")+1]}

Of course, if some list element should not contain "next" exactly once,
these two versions would give different results.

We can also expect some kind of performance difference.  My expectation
was that as the "unlist" version has to build a data structure (the
flattened list) which is not part of the result, the "unlist" version
would be inferior.  But one must not trust to intuition; this is an
empirical question deserving an empirical answer.  I did this:

lis <- list(c("a","b","next","want1","c"), c("d","next","want2","a"))
f1 <- function(lis) sapply(lis, function(v) v[which(v=="next")+1])
f2 <- function(lis) {lis<-unlist(lis); lis[which(lis=="next")+1]}

system.time(for(i in 1:10000) f1(lis))
[1] 22.03  7.56 30.97  0.00  0.00
system.time(for(i in 1:10000) f2(lis))
[1] 5.38 1.65 7.44 0.00 0.00

Hmm, unlist is about 4 times faster.  Is that still true with
bigger lists?
lis <- list(lis[[1]],lis[[2]],lis[[1]],lis[[2]],lis[[1]],lis[[2]],
            lis[[1]],lis[[2]],lis[[1]],lis[[2]],lis[[1]],lis[[2]])

system.time(for(i in 1:4000) f1(lis))
[1] 30.91  9.66 42.06  0.00  0.00
> system.time(for(i in 1:4000) f2(lis))
[1] 2.96 0.65 3.67 0.00 0.00

Yep, it holds up.

This is by no means an exhaustive study, but it certainly suggests that
the "unlist" version may be faster than the "sapply" version.

Here's why my intuition was wrong:  the "sapply" version calls a user-
defined function once for each element of the result, while the "unlist"
version uses nothing but built in operations.  Calling user-defined
functions is currently slow in R.