[R] moving average with gaps in time series
Carl Witthoft
carl at witthoft.com
Thu Dec 16 23:29:01 CET 2010
I'm not completely sure what your test setup does, but wouldn't it be
simpler to take a moving average at every point, but set the window
dynamically to be over the previous hour?
In pseudocode, for the j-th sample:
meanval[j] <- mean(Value[ DateTime[(DateTime-DateTime[j])>0 &
(Datetime-DateTime[j]<3600)] ] )
where I'm just letting DateTime values be in seconds.
If you really want to reject all averages that would have 'looked back'
over a missing sample, I'd play with rle() to find the bad points and
use that to reduce the meanval vector I described above.
Carl
********quote****
I have a time series with interval of 2.5 minutes, or 24 observations
per hour. I am trying to find a 1 hr moving average, looking backward,
so that moving average at n = mean(n-23 : n)
The time series has about 1.5 million rows, with occasional gaps due to
poor data quality. I only want to take a 1 hour moving average for those
periods that are complete, i.e. have 24 observations in the previous hour.
The data is in 3 columns
Value DateTime interval
For example:
Value <- rnorm (100, 50, 3) #my data has 1.5 million rows; using 100
here for simple example
DateTime <- seq(from = 915148800, to=915156150, by =150) #time steps
1:50 at 150 second intervals
DateTime [51] <- 915156450 #skip one time step; DateTime[52:100] <-
seq(from = 915156600, to =915163800, by = 150) #resume time steps of 150
seconds
x <- cbind (Value, DateTime)
x <- as.data.frame(x)
x$DateTime <-as.POSIXct(x$DateTime, origin="1970-01-01", tz="GMT") x1 <-
x[-c(1:23), ] #trimming x to create direct comparison of DateTimes in x
and x1
x[,3] <- difftime(x1[,2], x[,2], units="mins") #ignore warning message
colnames(x) [3] <- "interval"
x[24:nrow(x),3] <- x[1:(nrow(x)-23),3] #set interval to be the number of
minutes between n-23 and n.
#57.5 indicates no gaps in the previous hour up to and including n.
# >57.5 indicates a gap in the previous n-23 rows.
x[1:23,3] <- 0/0 #NaN assigned to first 23 rows so as not to take
average of first hour.
#as expected, row 51: 73 indicates a gap, i.e. interval > 57.5
index <- which (x [,3] == 57.5) #which rows have no gaps in previous hour
#loop to calculate 1 hour moving average, only for periods which are
complete
for (i in 1:length(index)) {
x [index[i],4] <- mean(x[index[i]-23,1] : x[index[i],1]) }
#This loop works on this simple example; but this takes VERY long time
to run on x with 1.5 million rows. Over 1 hour running and still not
complete. I also tried increasing memory.limit to 4095, but still very
slow.
Any suggestions to make this run faster? I thought about using the lag
function but could not get it to work
More information about the R-help
mailing list