[R] Interpolating / smoothing missing time series data
Sean Davis
sdavis2 at mail.nih.gov
Thu Sep 8 13:03:27 CEST 2005
On 9/7/05 10:19 PM, "Gabor Grothendieck" <ggrothendieck at gmail.com> wrote:
> On 9/7/05, David James <djames at frontierassoc.com> wrote:
>> The purpose of this email is to ask for pre-built procedures or
>> techniques for smoothing and interpolating missing time series data.
>>
>> I've made some headway on my problem in my spare time. I started
>> with an irregular time series with lots of missing data. It even had
>> duplicated data. Thanks to zoo, I've cleaned that up -- now I have a
>> regular time series with lots of NA's.
>>
>> I want to use a regression model (i.e. ARIMA) to ill in the gaps. I
>> am certainly open to other suggestions, especially if they are easy
>> to implement.
>>
>> My specific questions:
>> 1. Presumably, once I get ARIMA working, I still have the problem of
>> predicting the past missing values -- I've only seen examples of
>> predicting into the future.
>> 2. When predicting the past (backcasting), I also want to take
>> reasonable steps to make the data look smooth.
>>
>> I guess I'm looking for a really good example in a textbook or white
>> paper (or just an R guru with some experience in this area) that can
>> offer some guidance.
>>
>> Venables and Ripley was a great start (Modern Applied Statistics with
>> S). I really had hoped that the "Seasonal ARIMA Models" section on
>> page 405 would help. It was helpful, but only to a point. I have a
>> hunch (based on me crashing arima numerous times -- maybe I'm just
>> new to this and doing things that are unreasonable?) that using
>> hourly data just does not mesh well with the seasonal arima code?
>
> Not sure if this answers your question but if you are looking for something
> simple then na.approx in the zoo package will linearly interpolate for you.
>
>> z <- zoo(c(1,2,NA,4,5))
>> na.approx(z)
> 1 2 3 4 5
> 1 2 3 4 5
Alternatively, if you are looking for "more smoothing", you could look at
using a moving average or median applied at points of interest with an
"appropriate" window size--see wapply in the gplots package (gregmisc
bundle). There are a number of other functions that can accomplish the same
task. A search for "moving window" or "moving average" in the archives may
produce some other ideas.
Sean
More information about the R-help
mailing list