[R] Calculate daily means from 5-minute interval data

Avi Gross @v|gro@@ @end|ng |rom ver|zon@net
Tue Aug 31 01:47:09 CEST 2021

Am I seeing an odd aspect to this discussion.

There are many ways to solve problems and some may be favored by some more
than others.

All require some examination of the data so it can be massaged into shape
for the processes that follow.

If you insist on using the matrix method to arrange that each row or column
has the data you want, then, yes, you need to guarantee all your data is
present and in the right order. If some may be missing, you may want to
write a program that generates all possible dates in order and interpolates
them back (or into a copy more likely) so all the missing items are
represented and show up as an NA or whatever you want. You may also want to
check all dates are in order with no duplicates and anything else that makes
sense and then you are free to ask the vector to be seen as a matrix with N
columns or rows.

For many, the solution is much cleaner to use constructs that may be more
resistant to imperfections or allow them to be treated better. I would
probably use tidyverse functionality these days but can easily understand
people preferring base R or other packages. I have done similar analyses of
real data gathered from streams of various chemicals and levels taken at
various times and depths including times no measures happened and times
there were more than one measure. It is thus much more robust to use methods
like group_by and then apply other such verbs already being done grouped and
especially when the next steps involved making plots with ggplot. It was
rather trivial for example, to replace multiple measures by the average of
the measures. And many of my plots are faceted by variables which is not
trivial to do in base R.

I suggest not falling in love with the first way you think of and try to
bend everything to fit. Yes, some methods may be quite a bit more efficient
but rarely do I run into problems even with quite large collections of data
like a quarter million rows with dozens of columns, including odd columns
like the output of some analysis.

And note the current set of data may be extended with more over time or you
may get other data collected that would not necessarily work well with a
hard-coded method but might easily adjust to a new method. 

-----Original Message-----
From: R-help <r-help-bounces using r-project.org> On Behalf Of Rich Shepard
Sent: Monday, August 30, 2021 7:34 PM
To: R Project Help <r-help using r-project.org>
Subject: Re: [R] Calculate daily means from 5-minute interval data

On Tue, 31 Aug 2021, Richard O'Keefe wrote:

> I made up fake data in order to avoid showing untested code. It's not 
> part of the process I was recommending. I expect data recorded every N 
> minutes to use NA when something is missing, not to simply not be 
> recorded. Well and good, all that means is that reshaping the data is 
> not a trivial call to matrix(). It does not mean that any additional 
> package is needed or appropriate and it does not affect the rest of the


The instruments in the gauge pipe don't know to write NA when they're not
measuring. :-) The outage period varies greatly by location, constituent
measured, and other unknown factors.

> You will want the POSIXct class, see ?DateTimeClasses. Do you know 
> whether the time stamps are in universal time or in local time?

The data values are not timestamps. There's one column for date a second
colume for time and a third column for time zone (P in the case of the west

> Above all, it doesn't affect the point that you probably should not be 
> doing any of this.

? (Doesn't require an explanation.)


R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

More information about the R-help mailing list