[R] Fill in missing times in a timeseries with NA
Gabor Grothendieck
ggrothendieck at gmail.com
Wed Oct 27 16:57:08 CEST 2010
On Wed, Oct 27, 2010 at 8:57 AM, lglew <l.glew at soton.ac.uk> wrote:
>
> Hi,
>
> I have a irregularly spaced time series dataset, which reads in from a .csv.
> I need to convert this to a regularly spaced time series by filling in
> missing rows of data with NAs.
>
> So my data, called NtuMot, looks like this (I've removed some of the
> additional rows for simplicity)....
> ELEID date_time height slope
> 1 2009-06-24 00:00:00 150 4.0
> 1 2009-06-24 01:00:00 175 4.0
> 1 2009-06-24 02:00:00 180 2.3
> 1 2009-06-24 03:00:00 200 1.0
> 1 2009-06-24 06:00:00 201 1.0
> 1 2009-06-24 07:00:00 202 0.0
> 1 2009-06-24 08:00:00 202 0.0
> 1 2009-06-24 09:00:00 202 0.0
> 1 2009-06-24 10:00:00 202 0.0
>
>
> I need to end up with this:
> ELEID date_time height slope
>
> 1 2009-06-24 00:00:00 150 4.0
> 1 2009-06-24 01:00:00 175 4.0
> 1 2009-06-24 02:00:00 180 2.3
> 1 2009-06-24 03:00:00 200 1.0
> 1 2009-06-24 04:00:00 NA NA
> 1 2009-06-24 05:00:00 NA NA
> 1 2009-06-24 06:00:00 201 1.0
> 1 2009-06-24 07:00:00 202 0.0
> 1 2009-06-24 08:00:00 202 0.0
> 1 2009-06-24 09:00:00 202 0.0
> 1 2009-06-24 10:00:00 202 0.0
>
> Any ideas much appreciated!
>
This will do it producing a new data frame:
grid.df <- data.frame(date_time = seq(DF[1, 2], DF[nrow(DF), 2], by = "hour"))
merge(DF, grid.df)
however, if you are dealing with irregular series you might find it
more convenient to use the zoo package:
library(zoo)
z <- zoo(DF[-2], DF[, 2])
g <- seq(start(z), end(z), by = "hour")
m <- merge(z, zoo(, g))
m
In the above we used this for data frame DF:
DF <- structure(list(ELEID = c(1, 1, 1, 1, 1, 1, 1, 1, 1),
date_time = structure(c(1245816000,
1245819600, 1245823200, 1245826800, 1245837600, 1245841200, 1245844800,
1245848400, 1245852000), class = c("POSIXt", "POSIXct"), tzone = ""),
height = c(150, 175, 180, 200, 201, 202, 202, 202, 202),
slope = c(4, 4, 2.3, 1, 1, 0, 0, 0, 0)), .Names = c("ELEID",
"date_time", "height", "slope"), row.names = c("2009-06-24 00:00:00",
"2009-06-24 01:00:00", "2009-06-24 02:00:00", "2009-06-24 03:00:00",
"2009-06-24 06:00:00", "2009-06-24 07:00:00", "2009-06-24 08:00:00",
"2009-06-24 09:00:00", "2009-06-24 10:00:00"), class = "data.frame")
--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
More information about the R-help
mailing list