[R] A more efficient way to roll values in an irregular time series dataset?

Gabor Grothendieck ggrothendieck at gmail.com
Mon Nov 8 21:24:33 CET 2010

On Mon, Nov 8, 2010 at 3:16 PM, Richard Vlasimsky
<richard.vlasimsky at imidex.com> wrote:
> Does anyone recommend a more efficient way to "roll" values in a time series dataset?
> I merged a bunch of different time series datasets (10's of thousands of them) whose observation dates and sampling interval differ.  Some time series observations are reported at the beginning of the month, some at the end, some on Mondays, some on Wednesday, some annually, etc.
> In the process of merging all of the irregular time series (by date observed), a significant number of NA's appear in the dataset where I really want the last reported value 'rolled'  forward.
> To use a concrete example, a time series that has reported values at the beginning of every month shows NA's for every day except the date it was reported (in this case, the first of the month).  I want the value to roll forward so that NA's after the first of the month are replaced with a last reported value.
> I wrote the following for loop to accomplish the task on the object 'dataset', however it is far to slow too process 10's of thousands of different time series with 15,000 observations each.  At this rate it is going, it would take weeks to complete.
> for(j in 1:length(names(dataset)))
> {
>        last<-NA;
>        for(i in 1:length(row.names(dataset)))
>                        ifelse(is.na(dataset[i,j]), test[i,j] <- last, last<-dataset[i,j]);
> }
> One would think a rather simple operation as this could perform much faster.  My sense is using the "apply" function is the way to go, however I just can't get my head around a function that would reference the last reported value.
> Any guidance is appreciated.

Don't know if its fast enough for you but in zoo you can merge and
carry the last occurrence forward like this:

# suppose z1, z2, z3 are zoo series

na.locf(merge(z1, z2, z3)) # as many as you like


L <- list(z1, z2, z3)
na.locf(do.call("merge", L))

which produces a multivariate series, one per column with NAs filled in.

Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

More information about the R-help mailing list