[R] Changing time intervals in data set

Avi Gross @v|gro@@ @end|ng |rom ver|zon@net
Wed Dec 15 21:04:06 CET 2021


I think Rich has shared aspects of the data before and may have forgotten we
want something here and now.

Besides a small sample of what the relevant columns look like and a
suggestion of what he wants some new column to look like, we probably need
more to understand what he wants.

The issue could be  bit like people who want to group their data by quarter,
for example, or by some other aspect such as when someone started and ended
one topic and switched to another. No way we can guess what he actually
wants.

What Rich writes may be perfectly clear to him but not others. It does sound
like there are periods people sit there and record measurements in seemingly
multiple (?contiguous) records with each recording the time at intervals
such as every five minutes, and/or 10 or 30. So a wild guess might be to
cluster them together by finding a GAP where the next record is close enough
in time to the previous ones. In essence, the condition seems to be that:

 time-of-current-record - time-of-previous-record > threshold

Where threshold may simply be thirty minutes, assuming that all the records
are also in the same series as in locations of measurement and do not
intertwine.

I assume, as usual, there are umpteen ways to deal with such sliding window
problems but am loathe to suggest any ideas till Rich has more clearly
defined the issue, perhaps by including a small amount of data in a format
trivial to copy/paste into our R implementation to play with and verify that
the solution seems to work.

But very loosely speaking, a simple sliding window of one might work. In
base R, you can use some form of loop, obviously, starting with column 2,
that perhaps uses a comparison from row N to row N-1 and sets some new
column value to something like 1 until it encounters a big enough gap when
it starts setting it to2 and so on. A later pass on the new data could use
grouping by that column, IF all of what I assume makes sense. 

And, of course, the tidyverse has perhaps easier to use functionality such
as their non-base functions of lag() and lead() used within something like
mutate()

https://dplyr.tidyverse.org/reference/lead-lag.html

But again, you need clearer requirements. You asked how to find when DATES
change. That is not the same as my guess as the date changes at midnight
local time so measures seconds apart would change. If you want to know when
clusters of non-overlapping measures change, that is another issue.

And what exactly do you want to do after determining when things change?
Depending on what you want, you may need a different way to solve the
initial problem. I mentioned the idea of grouping by another variable you
create as one such possibility. But many other solutions would not make a
grouping variable on every row, but insert some kind of cut mark in just the
first row or add a special row between groups and anything lese your
imagination supplies.

Clearly, you do not want us to solve the entire problem you are working on,
but more context may get you answers to the specific thing you are working
on. And, note that adding a new time column may not be required as they can
be created on the fly too in some places, given the other columns. But it
does help to have it in place, at least for a while, if you want to provide
answers such as how many measures were made in what total amount of time
(first to last.)



-----Original Message-----
From: R-help <r-help-bounces using r-project.org> On Behalf Of jim holtman
Sent: Wednesday, December 15, 2021 1:05 PM
To: Rich Shepard <rshepard using appl-ecosys.com>
Cc: R mailing list <r-help using r-project.org>
Subject: Re: [R] Changing time intervals in data set

At least show a sample of the data and then what you would like as output.

Thanks

Jim Holtman
*Data Munger Guru*


*What is the problem that you are trying to solve?Tell me what you want to
do, not how you want to do it.*


On Wed, Dec 15, 2021 at 6:40 AM Rich Shepard <rshepard using appl-ecosys.com>
wrote:

> A 33-year set of river discharge data at one gauge location has 
> recording intervals of 5, 10, and 30 minutes over the period of record.
>
> The data.frame/tibble has columns for year, month, day, hour, minute, 
> and datetime.
>
> Would difftime() allow me to find the dates when the changes occurred?
>
> TIA,
>
> Rich
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]

______________________________________________
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list