[R] Calculate daily means from 5-minute interval data
Jeff Newmiller
jdnewm|| @end|ng |rom dcn@d@v|@@c@@u@
Sat Sep 4 08:30:26 CEST 2021
On Fri, 3 Sep 2021, Rich Shepard wrote:
> On Thu, 2 Sep 2021, Jeff Newmiller wrote:
>
>> Regardless of whether you use the lower-level split function, or the
>> higher-level aggregate function, or the tidyverse group_by function, the
>> key is learning how to create the column that is the same for all records
>> corresponding to the time interval of interest.
>
> Jeff,
>
> I definitely agree with the above
>
>> If you convert the sampdate to POSIXct, the tz IS important, because most
>> of us use local timezones that respect daylight savings time, and a naive
>> conversion of standard time will run into trouble if R is assuming
>> daylight savings time applies. The lubridate package gets around this by
>> always assuming UTC and giving you a function to "fix" the timezone after
>> the conversion. I prefer to always be specific about timezones, at least
>> by using so something like
>> Sys.setenv( TZ = "Etc/GMT+8" )
>> which does not respect daylight savings.
>
> I'm not following you here. All my projects have always been in a single
> time zone and the data might be recorded at June 19th or November 4th but do
> not depend on whether the time is PDT or PST. My hosts all set the hardware
> clock to local time, not UTC.
The fact that your projects are in a single time zone is irrelevant. I am
not sure how you can be so confident in saying it does not matter whether
the data were recorded in PDT or PST, since if it were recorded in PDT
then there would be a day in March with 23 hours and another day in
November with 25 hours, but if it were recorded in PST then there would
always be 24 hours in every day, and R almost always assumes daylight
savings if you don't tell it otherwise!
I am also normally working with automated collection devices that record
data in standard time year round. But if you fail to tell R that this is
the case, then it will almost always assume your data are stored with
daylight savings time and screw up the conversion to computable time
format. This screw up may include NA values in spring time when standard
time has perfectly valid times between 1am and 2am on the changeover day,
but in daylight time those timestamps would be invalid and will end up as
NA values in your timestamp column.
> As the location(s) at which data are collected remain fixed geographically I
> don't understand why daylight savings time, or non-daylight savings time is
> important.
I am telling you that it is important _TO R_ if you use POSIXt times.
Acknowledge this and move on with life, or avoid POSIXt data. As I said,
one way to acknowledge this while limiting the amount of attention you
have to give to the problem is to use UTC/GMT everywhere... but this can
lead to weird time of day problems as I pointed out in my timestamp
cleaning slides:
https://jdnewmil.github.io/time-2018-10/TimestampCleaning.html
If you want to use GMT everywhere... then you have to use GMT explicitly
because the default timezone in R is practically never GMT for most
people. You. Need. To. Be. Explicit. Don't fight it. Just do it. It isn't
hard.
>> Regarding using character data for identifying the month, in order to have
>> clean plots of the data I prefer to use the trunc function but it returns
>> a POSIXlt so I convert it to POSIXct:
>
> I don't use character data for months, as far as I know. If a sample data
> is, for example, 2021-09-03 then monthly summaries are based on '09', not
> 'September.'
You are taking this out of context and complaining that it has no context.
This was a reply to a response by Andrew Simmons in which he used the
"format" function to create unique year/month strings to act as group-by
data. Earlier, when I originally responded to clarify how you could use
the dplyr group_by function, I used your character date column without
combining it with time or convertint to Date at all. If you studied these
responses more carefully you would indeed have been using character data
for grouping in some cases, and my only point was that doing so can indeed
be a shortcut to the immediate answer while being troublesome later in the
analysis. Accusing you of mishandling data was not my intention.
> I've always valued your inputs to help me understand what I don't. In this
> case I'm really lost in understanding your position.
I hope my comments are clear enough now.
> Have a good Labor Day weekend,
Thanks! (Not relevant to many on this list.)
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil using dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
More information about the R-help
mailing list