[R] Calculate daily means from 5-minute interval data

Sun Sep 5 19:18:48 CEST 2021

What is the best way to read (from a text file) timestamps from the fall
time change, where there are two 1:15am's?  E.g., here is an extract from a
US Geological Survey web site giving data on the river through our county
on 2020-11-01, when we changed from PDT to PST,
https://nwis.waterdata.usgs.gov/wa/nwis/uv/?cb_00010=on&cb_00060=on&cb_00065=on&format=rdb&site_no=12200500&period=&begin_date=2020-11-01&end_date=2020-11-05
.

The timestamps include the date and time as well as PDT or PST.

river <-
c("datetime,tz,discharge,height,temp",
  "2020-11-01 00:00,PDT,20500,16.44,9.3",
  "2020-11-01 00:15,PDT,20500,16.44,9.3",
  "2020-11-01 00:30,PDT,20500,16.43,9.3",
  "2020-11-01 00:45,PDT,20400,16.40,9.3",
  "2020-11-01 01:00,PDT,20400,16.40,9.3",
  "2020-11-01 01:00,PST,20200,16.34,9.2",
  "2020-11-01 01:15,PDT,20400,16.39,9.3",
  "2020-11-01 01:15,PST,20200,16.34,9.2",
  "2020-11-01 01:30,PDT,20300,16.37,9.2",
  "2020-11-01 01:30,PST,20100,16.31,9.2",
  "2020-11-01 01:45,PDT,20300,16.35,9.2",
  "2020-11-01 01:45,PST,20100,16.29,9.2",
  "2020-11-01 02:00,PST,20100,16.29,9.2",
  "2020-11-01 02:15,PST,20000,16.27,9.1",
  "2020-11-01 02:30,PST,20000,16.26,9.1"
  )
d <- read.table(text=river, sep=",",header=TRUE)

The entries are obviously not in time order.

Is there a simple way to read the timedate and tz columns together?  One
way is to use d$tz to construct an offset that can be read with
strptime's "%z".

> d$POSIXct <-
as.POSIXct(paste(d$datetime,ifelse(d$tz=="PDT","-0700","-0800")),
format="%Y-%m-%d %H:%M %z")
> d
           datetime  tz discharge height temp             POSIXct
1  2020-11-01 00:00 PDT     20500  16.44  9.3 2020-11-01 00:00:00
2  2020-11-01 00:15 PDT     20500  16.44  9.3 2020-11-01 00:15:00
3  2020-11-01 00:30 PDT     20500  16.43  9.3 2020-11-01 00:30:00
4  2020-11-01 00:45 PDT     20400  16.40  9.3 2020-11-01 00:45:00
5  2020-11-01 01:00 PDT     20400  16.40  9.3 2020-11-01 01:00:00
6  2020-11-01 01:00 PST     20200  16.34  9.2 2020-11-01 01:00:00
7  2020-11-01 01:15 PDT     20400  16.39  9.3 2020-11-01 01:15:00
8  2020-11-01 01:15 PST     20200  16.34  9.2 2020-11-01 01:15:00
9  2020-11-01 01:30 PDT     20300  16.37  9.2 2020-11-01 01:30:00
10 2020-11-01 01:30 PST     20100  16.31  9.2 2020-11-01 01:30:00
11 2020-11-01 01:45 PDT     20300  16.35  9.2 2020-11-01 01:45:00
12 2020-11-01 01:45 PST     20100  16.29  9.2 2020-11-01 01:45:00
13 2020-11-01 02:00 PST     20100  16.29  9.2 2020-11-01 02:00:00
14 2020-11-01 02:15 PST     20000  16.27  9.1 2020-11-01 02:15:00
15 2020-11-01 02:30 PST     20000  16.26  9.1 2020-11-01 02:30:00
> with(d[order(d$POSIXct),], plot(temp)) # monotonic temperature

-Bill

On Thu, Sep 2, 2021 at 12:41 PM Jeff Newmiller <jdnewmil using dcn.davis.ca.us>
wrote:

> Regardless of whether you use the lower-level split function, or the
> higher-level aggregate function, or the tidyverse group_by function, the
> key is learning how to create the column that is the same for all records
> corresponding to the time interval of interest.
>
> If you convert the sampdate to POSIXct, the tz IS important, because most
> of us use local timezones that respect daylight savings time, and a naive
> conversion of standard time will run into trouble if R is assuming daylight
> savings time applies. The lubridate package gets around this by always
> assuming UTC and giving you a function to "fix" the timezone after the
> conversion. I prefer to always be specific about timezones, at least by
> using so something like
>
>     Sys.setenv( TZ = "Etc/GMT+8" )
>
> which does not respect daylight savings.
>
> Regarding using character data for identifying the month, in order to have
> clean plots of the data I prefer to use the trunc function but it returns a
> POSIXlt so I convert it to POSIXct:
>
>     discharge$sampmonthbegin <- as.POSIXct( trunc( discharge$sampdate,
> units = "months" ) )
>
> Then any of various ways can be used to aggregate the records by that
> column.
>
> On September 2, 2021 12:10:15 PM PDT, Andrew Simmons <akwsimmo using gmail.com>
> wrote:
> >You could use 'split' to create a list of data frames, and then apply a
> >function to each to get the means and sds.
> >
> >
> >cols <- "cfs"  # add more as necessary
> >S <- split(discharge[cols], format(discharge$sampdate, format = "%Y-%m"))
> >means <- do.call("rbind", lapply(S, colMeans, na.rm = TRUE))
> >sds   <- do.call("rbind", lapply(S, function(xx) sapply(xx, sd, na.rm =
> >TRUE)))
> >
> >On Thu, Sep 2, 2021 at 3:01 PM Rich Shepard <rshepard using appl-ecosys.com>
> >wrote:
> >
> >> On Thu, 2 Sep 2021, Rich Shepard wrote:
> >>
> >> > If I correctly understand the output of as.POSIXlt each date and time
> >> > element is separate, so input such as 2016-03-03 12:00 would now be
> 2016
> >> 03
> >> > 03 12 00 (I've not read how the elements are separated). (The TZ is
> not
> >> > important because all data are either PST or PDT.)
> >>
> >> Using this script:
> >> discharge <- read.csv('../data/water/discharge.dat', header = TRUE, sep
> =
> >> ',', stringsAsFactors = FALSE)
> >> discharge$sampdate <- as.POSIXlt(discharge$sampdate, tz = "",
> >>                                   format = '%Y-%m-%d %H:%M',
> >>                                   optional = 'logical')
> >> discharge$cfs <- as.numeric(discharge$cfs, length = 6)
> >>
> >> I get this result:
> >> > head(discharge)
> >>               sampdate    cfs
> >> 1 2016-03-03 12:00:00 149000
> >> 2 2016-03-03 12:10:00 150000
> >> 3 2016-03-03 12:20:00 151000
> >> 4 2016-03-03 12:30:00 156000
> >> 5 2016-03-03 12:40:00 154000
> >> 6 2016-03-03 12:50:00 150000
> >>
> >> I'm completely open to suggestions on using this output to calculate
> >> monthly
> >> means and sds.
> >>
> >> If dplyr:summarize() will do so please show me how to modify this
> command:
> >> disc_monthly <- ( discharge
> >>          %>% group_by(sampdate)
> >>          %>% summarize(exp_value = mean(cfs, na.rm = TRUE))
> >> because it produces daily means, not monthly means.
> >>
> >> TIA,
> >>
> >> Rich
> >>
> >> ______________________________________________
> >> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
> >       [[alternative HTML version deleted]]
> >
> >______________________________________________
> >R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
>
> --
> Sent from my phone. Please excuse my brevity.
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]