[R] Interpolating / smoothing missing time series data

Sean Davis sdavis2 at mail.nih.gov
Thu Sep 8 13:03:27 CEST 2005


On 9/7/05 10:19 PM, "Gabor Grothendieck" <ggrothendieck at gmail.com> wrote:

> On 9/7/05, David James <djames at frontierassoc.com> wrote:
>> The purpose of this email is to ask for pre-built procedures or
>> techniques for smoothing and interpolating missing time series data.
>> 
>> I've made some headway on my problem in my spare time.  I started
>> with an irregular time series with lots of missing data.  It even had
>> duplicated data.  Thanks to zoo, I've cleaned that up -- now I have a
>> regular time series with lots of NA's.
>> 
>> I want to use a regression model (i.e. ARIMA) to ill in the gaps.  I
>> am certainly open to other suggestions, especially if they are easy
>> to implement.
>> 
>> My specific questions:
>> 1.  Presumably, once I get ARIMA working, I still have the problem of
>> predicting the past missing values -- I've only seen examples of
>> predicting into the future.
>> 2.  When predicting the past (backcasting), I also want to take
>> reasonable steps to make the data look smooth.
>> 
>> I guess I'm looking for a really good example in a textbook or white
>> paper (or just an R guru with some experience in this area) that can
>> offer some guidance.
>> 
>> Venables and Ripley was a great start (Modern Applied Statistics with
>> S).  I really had hoped that the "Seasonal ARIMA Models" section on
>> page 405 would help.  It was helpful, but only to a point.  I have a
>> hunch (based on me crashing arima numerous times -- maybe I'm just
>> new to this and doing things that are unreasonable?) that using
>> hourly data just does not mesh well with the seasonal arima code?
> 
> Not sure if this answers your question but if you are looking for something
> simple then na.approx in the zoo package will linearly interpolate for you.
> 
>> z <- zoo(c(1,2,NA,4,5))
>> na.approx(z)
> 1 2 3 4 5 
> 1 2 3 4 5

Alternatively, if you are looking for "more smoothing", you could look at
using a moving average or median applied at points of interest with an
"appropriate" window size--see wapply in the gplots package (gregmisc
bundle).  There are a number of other functions that can accomplish the same
task.  A search for "moving window" or "moving average" in the archives may
produce some other ideas.

Sean




More information about the R-help mailing list