[R] LOCF - Last Observation Carried Forward

Mon Nov 17 17:50:47 CET 2003

Here's a faster version of "most.recent".  It uses "rep()" in a vectorized 
manner.

 > # Gabor Grothendieck's function:
 > most.recent.cut <- function(x)
+     as.numeric(as.vector(cut(seq(x),c(which(x),Inf),lab=which(x),right=F)))
 >
 > # Version that uses which() and vectorized rep()
 > most.recent <- function(x) {
+     # return a vector of indices of the most recent TRUE value
+     if (!is.logical(x))
+         stop("x must be logical")
+     x.pos <- which(x)
+     if (length(x.pos)==0 || x.pos[1] != 1)
+         x.pos <- c(1, x.pos)
+     rep(x.pos, c(diff(x.pos), length(x) - x.pos[length(x.pos)] + 1))
+ }
 >

 > x <- sample(c(T,F),1e7,rep=T)
 > system.time(most.recent.cut(x))
[1] 41.21  0.54 41.98    NA    NA
 > system.time(most.recent(x))
[1] 2.67 0.08 2.78   NA   NA
 >

-- Tony Plate

At Friday 10:21 PM 11/14/2003 -0500, Gabor Grothendieck wrote:

>From: Tony Plate <tplate at acm.org>:
> >
> > Here's a function that does the essential computation (written to work in
> > both S-plus and R).
> >
> > This looks like one of those tricky problems that do not vectorize
> > easily. It would be simple to write a C-program to compute this very
> > efficiently. But are there any more efficient solutions than ones like the
> > below (that are written without resort to C)?
> >
> > most.recent <- function(x) {
> > # return a vector of indices of the most recent TRUE value
> > if (!is.logical(x))
> > stop("x must be logical")
> > x[is.na(x)] <- FALSE
> > # x is a logical vector
> > r <- rle(x)
> > ends <- cumsum(r$lengths)
> > starts <- ends - r$lengths + 1
> > spec <- as.list(as.data.frame(rbind(start=starts, len=r$lengths,
> > value=as.numeric(r$values), prev.end=c(NA, ends[-length(ends)]))))
> > names(spec) <- NULL
> > unlist(lapply(spec, function(s) if (s[3]) seq(s[1], len=s[2]) else
> > rep(s[4], len=s[2])), use.names=F)
> > }
> >
> > > x <- c(F,T,T,F,F,F,T,F)
> > > most.recent(x)
> > [1] NA 2 3 3 3 3 7 7
> >
> > And using it to do the fill-forward:
> >
> > > x <- c(NA,2,3,NA,4,NA,5,NA,NA,NA,6,7,8,NA)
> > > x[most.recent(!is.na(x))]
> > [1] NA 2 3 3 4 4 5 5 5 5 6 7 8 8
> > >
> >
> > Some timings:
> >
> > > x <- sample(c(T,F),1e4,rep=T)
> > > system.time(most.recent(x))
> > [1] 0.33 0.01 0.47 NA NA
> > > x <- sample(c(T,F),1e5,rep=T)
> > > system.time(most.recent(x))
> > [1] 4.27 0.06 6.44 NA NA
> > > x <- sample(c(T,F),1e6,rep=T)
> > > system.time(most.recent(x))
> > [1] 47.27 0.17 47.97 NA NA
> > >
> >
> > -- Tony Plate
> >
> > PS. Actually, I just found a solution that I had lying around that is 
> about
> > 70 times as fast on random test data like the above.
>
>I was waiting for you to post this but didn't see it so I thought
>I would post mine.  This one is 13x as fast and only requires
>a single line of code.
>
> > set.seed(111)
> > x <- sample(c(T,F),10000,rep=T)
>
> > system.time(z1 <- most.recent(x))
>[1] 0.92 0.02 1.68   NA   NA
>
> > system.time(z2 <- as.numeric(as.vector(
>      cut(seq(x),c(which(x),Inf),lab=which(x),right=F))))
>[1] 0.07 0.00 0.12   NA   NA
>
> > all.equal(z1,z2)
>[1] TRUE
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Tony Plate   tplate at acm.org