[Rd] Any interest in "merge" and "by" implementations specifically for so

Mon Jul 31 17:57:43 CEST 2006

> Hi Tom,
> 
> > Now, try sorting and using a loop:
> >
> >> idx <- order(i)
> >> xs <- x[idx]
> >> is <- i[idx]
> >> res <- array(NA, 1e6)
> >> idx <- which(diff(is) > 0)
> >> startidx <- c(1, idx+1)
> >> endidx <- c(idx, length(xs))
> >> f1 <- function(x, startidx, endidx, FUN = sum)  {
> > +   for (j in 1:length(res)) {
> > +     res[j] <- FUN(x[startidx[j]:endidx[j]])
> > +   }
> > +   res
> > + }
> >> unix.time(res1 <- f1(xs, startidx, endidx))
> > [1] 6.86 0.00 7.04   NA   NA
> 
> I wonder how much time the sorting, reordering and creation os  
> startidx and endidx would add to this time?

Done interactively, sorting and indexing seemed fast. Here are some timings:

> unix.time({idx <- order(i)
+            xs <- x[idx]
+            is <- i[idx]
+            res <- array(NA, 1e6)
+            idx <- which(diff(is) > 0)
+            startidx <- c(1, idx+1)
+            endidx <- c(idx, length(xs))
+          })
[1] 1.06 0.00 1.09   NA   NA

> That looks interesting.  Does it only work for specific operating  
> systems and processors?  I will give it a try.

No, as far as I know, it works on all operating systems. Also, it gets a
little faster if you directly put the sum in the function:

> f4 <- function(x, startidx, endidx)  {
+   for (j in 1:length(res)) {
+     res[j] <- sum(x[startidx[j]:endidx[j]])
+   }
+   res
+ }
> f5 <- cmpfun(f4)
> unix.time(res5 <- f5(xs, startidx, endidx))
[1] 2.67 0.03 2.95   NA   NA

- Tom

-- 
View this message in context: http://www.nabble.com/Any-interest-in-%22merge%22-and-%22by%22-implementations-specifically-for-sorted-data--tf2009595.html#a5578580
Sent from the R devel forum at Nabble.com.