[R] Vectorization/Speed Problem

Gabor Grothendieck ggrothendieck at gmail.com
Wed Nov 21 02:02:50 CET 2007


Let x be the input vector and cx be the cumulative running sum of it.
Then seq_along(cx) - match(cx, cx) gives increasing sequences
starting at 0 and for those after the leading zeros we start them
at 1 by adding cummax(x).

x <- c(0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0) # input

cx <- cumsum(x)
seq_along(cx) - match(cx, cx) + cummax(x)

On Nov 20, 2007 6:42 PM, Tom Johnson <tjohnson at covad.net> wrote:
> Hi,
>
> I cannot find a 'vectorized' solution to this 'for loop' kind of problem.
> Do you see a vectorized, fast-running solution?
>
> Objective:
> Take the value of X at each timepoint and calculate the corresponding value
> of Y.  Leading 0's and all 1's for X are assigned to Y; otherwise Y is
> incremented by the number of 0's adjacent to the last 1.  The frequency and
> distribution of X vary widely and may have ~100 repeated 0's or 1's in a
> vector of 10k timepoints.
>
> Example:
> time 1   2   3   4   5   6   7   8   9   10  11  12  13  14  15
> X    0   1   0   1   0   1   0   0   1   1   1   0   0   0   . .
> Y    0   1   2   1   2   1   2   3   1   1   1   2   3   4   . .
>
> What I have done:
> My for() and apply()-related standard solutions are too slow.  They are 6
> times slower than my prototype, vectorized code which uses cumsum().
> However(!)... my results are inaccurate and I can't correct them without
> introducing a for()!  Here is my shot at a vectorized solution, as far as I
> can take it.
>
> Preliminary Vectorized Code:
> X       <- matrix(sample(c(1,0,0,0,0), 500, replace = TRUE), 25, 20, byrow=TRUE)
>        colnames(X) <- c(paste("a", 1:20, sep=""))
> noX <- X; noX[X!=0] <- 0; cumX <- noX; cumNoX <- noX; Y1 <- noX; Y2 <- X; Y3
> <- X
>
> for (e in 1:ncol(X)) {
>        cumX[,e] <- cumsum(X[,e])
>        noX[X[,e] < 1 & cumsum(X[,e]) > 0 ,e] <- 1
>        cumNoX[,e] <- cumsum(noX[,e])
>        }
> Y1[cumNoX > 0] <- cumNoX[cumNoX > 0] + 1
> Y2[X == 0 & noX > 0] <- Y1[X == 0 & noX > 0]
> Y3 <- Y2
> Y3[cumX > 1 & noX > 0] <- Y2[cumX > 1 & noX > 0] - cumX[cumX > 1 & noX > 0]
> X; Y3
>
> Your help would be greatly appreciated!  I'm stuck.
> Thank you,
>
> Tom
> Johnson



More information about the R-help mailing list