[R] Trimming time series to only include complete years
Jeff Newmiller
jdnewmil at dcn.davis.ca.us
Sat May 28 21:56:55 CEST 2016
# read about POSIXlt at ?DateTimeClasses
# note that the "mon" element is 0-11
isPartialWaterYear <- function( d ) {
dtl <- as.POSIXlt( dat$Date )
wy1 <- cumsum( ( 9 == dtl$mon ) & ( 1 == dtl$mday ) )
( 0 == wy1 # first partial year
| ( 8 != dtl$mon[ nrow( dat ) ] # end partial year
& 30 != dtl$mday[ nrow( dat ) ]
) & wy1[ nrow( dat ) ] == wy1
)
}
dat2 <- dat[ !isPartialWaterYear( dat$Date ), ]
The above assumes that, as you said, the data are continuous at one-day
intervals, such that the only partial years will occur at the beginning
and end. The "diff" function could be used to identify irregular data
within the data interval if needed.
On Fri, 27 May 2016, Morway, Eric wrote:
> In bulk processing streamflow data available from an online database, I'm
> wanting to trim the beginning and end of the time series so that daily data
> associated with incomplete "water years" (defined as extending from Oct 1st
> to the following September 30th) is trimmed off the beginning and end of
> the series.
>
> For a small reproducible example, the time series below starts on
> 2010-01-01 and ends on 2011-11-05. So the data between 2010-01-01 and
> 2010-09-30 and also between 2011-10-01 and 2011-11-05 is not associated
> with a complete set of data for their respective water years. With the
> real data, the initial date of collection is arbitrary, could be 1901 or
> 1938, etc. Because I'm cycling through potentially thousands of records, I
> need help in designing a function that is efficient.
>
> dat <-
> data.frame(Date=seq(as.Date("2010-01-01"),as.Date("2011-11-05"),by="day"))
> dat$Q <- rnorm(nrow(dat))
>
> dat$wyr <- as.numeric(format(dat$Date,"%Y"))
> is.nxt <- as.numeric(format(dat$Date,"%m")) %in% 1:9
> dat$wyr[!is.nxt] <- dat$wyr[!is.nxt] + 1
>
>
> function(dat) {
> ...
> returns a subset of dat such that dat$Date > xxxx-09-30 & dat$Date <
> yyyy-10-01
> ...
> }
>
> where the years between xxxx-yyyy are "complete" (no missing days). In the
> example above, the returned dat would extend from 2010-10-01 to 2011-09-30
>
> Any offered guidance is very much appreciated.
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
More information about the R-help
mailing list