[Rd] Any interest in "merge" and "by" implementations specifically for so
tshort
tshort at eprisolutions.com
Mon Jul 31 17:57:43 CEST 2006
> Hi Tom,
>
> > Now, try sorting and using a loop:
> >
> >> idx <- order(i)
> >> xs <- x[idx]
> >> is <- i[idx]
> >> res <- array(NA, 1e6)
> >> idx <- which(diff(is) > 0)
> >> startidx <- c(1, idx+1)
> >> endidx <- c(idx, length(xs))
> >> f1 <- function(x, startidx, endidx, FUN = sum) {
> > + for (j in 1:length(res)) {
> > + res[j] <- FUN(x[startidx[j]:endidx[j]])
> > + }
> > + res
> > + }
> >> unix.time(res1 <- f1(xs, startidx, endidx))
> > [1] 6.86 0.00 7.04 NA NA
>
> I wonder how much time the sorting, reordering and creation os
> startidx and endidx would add to this time?
Done interactively, sorting and indexing seemed fast. Here are some timings:
> unix.time({idx <- order(i)
+ xs <- x[idx]
+ is <- i[idx]
+ res <- array(NA, 1e6)
+ idx <- which(diff(is) > 0)
+ startidx <- c(1, idx+1)
+ endidx <- c(idx, length(xs))
+ })
[1] 1.06 0.00 1.09 NA NA
> That looks interesting. Does it only work for specific operating
> systems and processors? I will give it a try.
No, as far as I know, it works on all operating systems. Also, it gets a
little faster if you directly put the sum in the function:
> f4 <- function(x, startidx, endidx) {
+ for (j in 1:length(res)) {
+ res[j] <- sum(x[startidx[j]:endidx[j]])
+ }
+ res
+ }
> f5 <- cmpfun(f4)
> unix.time(res5 <- f5(xs, startidx, endidx))
[1] 2.67 0.03 2.95 NA NA
- Tom
--
View this message in context: http://www.nabble.com/Any-interest-in-%22merge%22-and-%22by%22-implementations-specifically-for-sorted-data--tf2009595.html#a5578580
Sent from the R devel forum at Nabble.com.
More information about the R-devel
mailing list