[R] Calculate daily means from 5-minute interval data

Sat Sep 4 08:30:26 CEST 2021

On Fri, 3 Sep 2021, Rich Shepard wrote:

> On Thu, 2 Sep 2021, Jeff Newmiller wrote:
>
>> Regardless of whether you use the lower-level split function, or the
>> higher-level aggregate function, or the tidyverse group_by function, the
>> key is learning how to create the column that is the same for all records
>> corresponding to the time interval of interest.
>
> Jeff,
>
> I definitely agree with the above
>
>> If you convert the sampdate to POSIXct, the tz IS important, because most
>> of us use local timezones that respect daylight savings time, and a naive
>> conversion of standard time will run into trouble if R is assuming
>> daylight savings time applies. The lubridate package gets around this by
>> always assuming UTC and giving you a function to "fix" the timezone after
>> the conversion. I prefer to always be specific about timezones, at least
>> by using so something like
>>    Sys.setenv( TZ = "Etc/GMT+8" )
>> which does not respect daylight savings.
>
> I'm not following you here. All my projects have always been in a single
> time zone and the data might be recorded at June 19th or November 4th but do
> not depend on whether the time is PDT or PST. My hosts all set the hardware
> clock to local time, not UTC.

The fact that your projects are in a single time zone is irrelevant. I am 
not sure how you can be so confident in saying it does not matter whether 
the data were recorded in PDT or PST, since if it were recorded in PDT 
then there would be a day in March with 23 hours and another day in 
November with 25 hours, but if it were recorded in PST then there would 
always be 24 hours in every day, and R almost always assumes daylight 
savings if you don't tell it otherwise!

I am also normally working with automated collection devices that record 
data in standard time year round. But if you fail to tell R that this is 
the case, then it will almost always assume your data are stored with 
daylight savings time and screw up the conversion to computable time 
format. This screw up may include NA values in spring time when standard 
time has perfectly valid times between 1am and 2am on the changeover day, 
but in daylight time those timestamps would be invalid and will end up as 
NA values in your timestamp column.

> As the location(s) at which data are collected remain fixed geographically I
> don't understand why daylight savings time, or non-daylight savings time is
> important.

I am telling you that it is important _TO R_ if you use POSIXt times. 
Acknowledge this and move on with life, or avoid POSIXt data. As I said, 
one way to acknowledge this while limiting the amount of attention you 
have to give to the problem is to use UTC/GMT everywhere... but this can 
lead to weird time of day problems as I pointed out in my timestamp 
cleaning slides: 
https://jdnewmil.github.io/time-2018-10/TimestampCleaning.html

If you want to use GMT everywhere... then you have to use GMT explicitly 
because the default timezone in R is practically never GMT for most 
people. You. Need. To. Be. Explicit. Don't fight it. Just do it. It isn't 
hard.

>> Regarding using character data for identifying the month, in order to have
>> clean plots of the data I prefer to use the trunc function but it returns
>> a POSIXlt so I convert it to POSIXct:
>
> I don't use character data for months, as far as I know. If a sample data
> is, for example, 2021-09-03 then monthly summaries are based on '09', not
> 'September.'

You are taking this out of context and complaining that it has no context. 
This was a reply to a response by Andrew Simmons in which he used the 
"format" function to create unique year/month strings to act as group-by 
data. Earlier, when I originally responded to clarify how you could use 
the dplyr group_by function, I used your character date column without 
combining it with time or convertint to Date at all. If you studied these 
responses more carefully you would indeed have been using character data 
for grouping in some cases, and my only point was that doing so can indeed 
be a shortcut to the immediate answer while being troublesome later in the 
analysis. Accusing you of mishandling data was not my intention.

> I've always valued your inputs to help me understand what I don't. In this
> case I'm really lost in understanding your position.

I hope my comments are clear enough now.

> Have a good Labor Day weekend,

Thanks! (Not relevant to many on this list.)

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil using dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k