[R] adding zeroes after old zeroes in a vector ??

Fri Sep 10 22:55:40 CEST 2010

> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of skan
> Sent: Friday, September 10, 2010 12:33 PM
> To: r-help at r-project.org
> Subject: Re: [R] adding zeroes after old zeroes in a vector ??
> 
> 
> Hi
> 
> I'll study your answers.
> 
> I could also try
> gsub("01", "00", x)  N times
> but it could be very slow if N is large
> In fact when I wrote 111110011    I mean a vector
> 1
> 1
> 1
> 1
> 1
> 0
> 0
> 1
> 1
> not a string, but I wrote it more compactly.
> 
> I also could by shifting the elements of the vector one 
> position and ANDing
> the result with the original. And again shifting 2 postions 
> and so on up to
> N. But it's very slow.

How did you do the shifting (show code!) and how slow is slow?
What is a typical length for the vector, what is a typical
value of N, and what is a typical number of 0-runs in the vector?
No code will be fastest over that whole parameter space.

E.g., the following might run out of memory if the product
of the number of runs and N is too big, but seems pretty quick
for moderate N:
  f1 <- function(x, N) {
     nx <- length(x)
     indexOfOneAfterZero <- which(c(FALSE, x[-1]==1 & x[-nx]==0))
     indexToZero <- outer(indexOfOneAfterZero, seq_len(N)-1, "+")
     indexToZero <- indexToZero[indexToZero<=nx]
     x[indexToZero] <- 0
     x
  }

E.g., for a vector with lots of short runs we get:
  > system.time(f1(sample(c(0,1),replace=TRUE,size=1e6), N=10))
     user  system elapsed 
     0.87    0.08    0.94 
  > system.time(f1(sample(c(0,1),replace=TRUE,size=1e6), N=100))
     user  system elapsed 
     3.75    0.80    4.32 
  > system.time(f1(sample(c(0,1),replace=TRUE,size=1e6), N=1000))
  Error: cannot allocate vector of size 953.8 Mb
  Timing stopped at: 0.24 0.03 0.26 

You can make one that is a tad slower but works for the bigger #runs *
N:
  f2 <- function (x, N) 
  {
      nx <- length(x)
      isOneAfterZero <- c(FALSE, x[-1] == 1 & x[-nx] == 0)
      for (i in seq_len(N)) {
          x[isOneAfterZero] <- 0
          isOneAfterZero <- c(FALSE,
isOneAfterZero[-length(isOneAfterZero)])
      }
      x
  }
  > system.time(f2(x, N=10))
     user  system elapsed 
     0.58    0.03    0.59 
  > system.time(f2(x, N=100))
     user  system elapsed 
     5.08    0.86    5.54 
  > system.time(f2(x, N=1000))
     user  system elapsed 
    49.59    7.84   54.56

These have very different times when there are few
runs and big N:

  > y <- as.integer(sin(seq(0,50,len=1e6))>0)
  > system.time(f1(y, N=1000))
     user  system elapsed 
     0.13    0.07    0.22 
  > system.time(f2(y, N=1000))
     user  system elapsed 
    39.66    7.36   46.78  

My basic point is that when you ask for the fastest
solution you have to describe your problem better.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com  

> -- 
> View this message in context: 
> http://r.789695.n4.nabble.com/adding-zeroes-after-old-zeroes-i
n-a-vector-tp2534824p2534982.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>