[Rd] Any interest in "merge" and "by" implementations specifically for sorted data?

Sat Jul 29 06:32:21 CEST 2006

Hi Bill,

>>>    sum : igroupSums

Okay, after thinking about this ...

# assumes i is the small integer factor with n levels
# v is some long vector
# no sorting required

igroupSums <- function(v,i) {
   sums <- rep(0,max(i))
   for (j in 1:length(v)) {
       sums[[i[[j]]]] <- sums[[i[[j]]]] + v[[j]]
   }
   sums
}

if written in fortran or c might be faster than using split.  It is  
at least just linear in time with the length of vector v.  This  
approach could be easily made parallel to t threads simply by picking  
t starting points someplace along v and running this routine in  
parallel on each piece.  You could even do it without thread locking  
if "sums" elements can be accessed atomically or by creating multiple  
copies of "sums" (one for each piece) and then doing a final addition.

I still think I am missing some obvious way to do this but ...

Am I thinking along the right lines?

Kevin