[R] Trimming time series to only include complete years

Jeff Newmiller jdnewmil at dcn.davis.ca.us
Tue May 31 00:15:54 CEST 2016


Sorry, I put too many bugs (opportunities for excellence!) in this on my 
first pass on this to leave it alone :-(

isPartialWaterYear2 <- function( d ) {
   dtl <- as.POSIXlt( d )
   wy1 <- cumsum( ( 9 == dtl$mon ) & ( 1 == dtl$mday ) )
   # any 0 in wy1 corresponds to first partial water year
   result <- 0 == wy1
   # if last day is not Sep 30, mark last water year as partial
   if ( 8 != dtl$mon[ length( d ) ]
      | 30 != dtl$mday[ length( d ) ] ) {
         result[ wy1[ length( d ) ] == wy1 ] <- TRUE
   }
   result
}

dat2 <- dat[ !isPartialWaterYear( dat$Date ), ]

On Sat, 28 May 2016, Jeff Newmiller wrote:

> # read about POSIXlt at ?DateTimeClasses
> # note that the "mon" element is 0-11
> isPartialWaterYear <- function( d ) {
>  dtl <- as.POSIXlt( dat$Date )
>  wy1 <- cumsum( ( 9 == dtl$mon ) & ( 1 == dtl$mday ) )
>  ( 0 == wy1  # first partial year
>  | (  8 != dtl$mon[ nrow( dat ) ] # end partial year
>    & 30 != dtl$mday[ nrow( dat ) ]
>    ) & wy1[ nrow( dat ) ] == wy1
>  )
> }
>
> dat2 <- dat[ !isPartialWaterYear( dat$Date ), ]
>
> The above assumes that, as you said, the data are continuous at one-day 
> intervals, such that the only partial years will occur at the beginning and 
> end. The "diff" function could be used to identify irregular data within the 
> data interval if needed.
>
> On Fri, 27 May 2016, Morway, Eric wrote:
>
>> In bulk processing streamflow data available from an online database, I'm
>> wanting to trim the beginning and end of the time series so that daily data
>> associated with incomplete "water years" (defined as extending from Oct 1st
>> to the following September 30th) is trimmed off the beginning and end of
>> the series.
>> 
>> For a small reproducible example, the time series below starts on
>> 2010-01-01 and ends on 2011-11-05.  So the data between 2010-01-01 and
>> 2010-09-30 and also between 2011-10-01 and 2011-11-05 is not associated
>> with a complete set of data for their respective water years.  With the
>> real data, the initial date of collection is arbitrary, could be 1901 or
>> 1938, etc.  Because I'm cycling through potentially thousands of records, I
>> need help in designing a function that is efficient.
>> 
>> dat <-
>> data.frame(Date=seq(as.Date("2010-01-01"),as.Date("2011-11-05"),by="day"))
>> dat$Q <- rnorm(nrow(dat))
>> 
>> dat$wyr <- as.numeric(format(dat$Date,"%Y"))
>> is.nxt <- as.numeric(format(dat$Date,"%m")) %in% 1:9
>> dat$wyr[!is.nxt] <- dat$wyr[!is.nxt] + 1
>> 
>> 
>> function(dat) {
>>   ...
>>   returns a subset of dat such that dat$Date > xxxx-09-30 & dat$Date <
>> yyyy-10-01
>>   ...
>> }
>> 
>> where the years between xxxx-yyyy are "complete" (no missing days).  In the
>> example above, the returned dat would extend from 2010-10-01 to 2011-09-30
>> 
>> Any offered guidance is very much appreciated.
>>
>> 	[[alternative HTML version deleted]]
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
>
> ---------------------------------------------------------------------------
> Jeff Newmiller                        The     .....       .....  Go Live...
> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
>                                      Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
> /Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k



More information about the R-help mailing list