# [Rd] Any interest in "merge" and "by" implementations specifically for so

Kevin B. Hendricks kevin.hendricks at sympatico.ca
Mon Jul 31 15:41:53 CEST 2006

```Hi Tom,

> Now, try sorting and using a loop:
>
>> idx <- order(i)
>> xs <- x[idx]
>> is <- i[idx]
>> res <- array(NA, 1e6)
>> idx <- which(diff(is) > 0)
>> startidx <- c(1, idx+1)
>> endidx <- c(idx, length(xs))
>> f1 <- function(x, startidx, endidx, FUN = sum)  {
> +   for (j in 1:length(res)) {
> +     res[j] <- FUN(x[startidx[j]:endidx[j]])
> +   }
> +   res
> + }
>> unix.time(res1 <- f1(xs, startidx, endidx))
>  6.86 0.00 7.04   NA   NA

I wonder how much time the sorting, reordering and creation os
startidx and endidx would add to this time?

Either way, your code can nicely be used to quickly create the small
integer factors I would need if the igroup functions get integrated.
Thanks!

> For the case of sum (or averages), you can vectorize this using
> cumsum as
> follows. This won't work for median or max.
>
>> f2 <- function(x, startidx, endidx)  {
> +   cum <- cumsum(x)
> +   res <- cum[endidx]
> +   res[2:length(res)] <- res[2:length(res)] - cum[endidx[1:(length
> (res) -
> 1)]]
> +   res
> + }
>> unix.time(res2 <- f2(xs, startidx, endidx))
>  0.20 0.00 0.21   NA   NA

Yes that is a quite fast way to handle "sums".

> You can also use Luke Tierney's byte compiler
> (http://www.stat.uiowa.edu/~luke/R/compiler/) to speed up the loop for
> functions where you can't vectorize:
>
>> library(compiler)
>> f3 <- cmpfun(f1)
> Note: local functions used: FUN
>> unix.time(res3 <- f3(xs, startidx, endidx))
>  3.84 0.00 3.91   NA   NA

That looks interesting.  Does it only work for specific operating
systems and processors?  I will give it a try.

Thanks,

Kevin

```