[R] Coda: On the efficiency of unsplit() for Rolf Turner's recent post

Mon Oct 7 01:53:20 CEST 2024

(only of interest -- maybe! -- to those who followed this thread of a
couple of weeks ago)

Just for the heckuva it, I compared the timing of Deepayan's unsplit(x,f)
solution to my as.vector(do.call(rbind, x)) approach to the query for a
list of 3 vectors each of length 1000 (the original toy example was for a
list of 3 vectors of length 5). Unsurprisingly, I think, because the
unsplit() approach works for the general case whereas the do.call(rbind)
only works for the balanced structure of the toy example, do.call(rbind)
took about 1/10th the time of unsplit:

> microbenchmark(unsplit(x,f),times = 1000L)
Unit: microseconds
          expr    min     lq     mean median    uq      max neval
 unsplit(x, f) 63.058 64.042 70.44419 65.682 67.24 3893.155  1000
--------------
> microbenchmark(as.vector(do.call(rbind,x)),times = 1000L)
Unit: microseconds
                         expr   min    lq     mean median    uq    max neval
 as.vector(do.call(rbind, x)) 5.617 6.396 7.082299  6.765 7.216 79.335  1000

**Maybe** this suggests that adding a "regular" (or better-named) option to
unsplit() that would allow a simpler faster algorithm to be used for the
special but perhaps not uncommon case of Rolf's structured toy example
might be useful.

Please do not reply to this, as I am too ignorant to judge whether this is
foolish or not. I leave it to those more qualified to either dismiss or act
on this. I just wanted to present some limited but suggestive data.

Cheers to all,
Bert

	[[alternative HTML version deleted]]