[R] Comparing dates in two large data frames

Rui Barradas ru|pb@rr@d@@ @end|ng |rom @@po@pt
Sat Apr 10 14:47:01 CEST 2021


The following solution seems to work and is fast, like findInterval is.
It first determines where in df2$start is each value of df1$Time. Then 
uses that index to see if those Times are not greater than the 
corresponding df$end.
I checked against a small subset of df1 and the results were right.

result <- logical(nrow(df1))
inx <- findInterval(df1$Time, df2$start)
not_zero <- inx != 0
result[not_zero] <- df1$Time[not_zero] <= df2$end[ inx[not_zero] ]

Hope this helps,

Rui Barradas

Às 12:06 de 10/04/21, Kulupp escreveu:
> Dear all,
> I have two data frames (df1 and df2) and for each timepoint in df1 I 
> want to know: is it whithin any of the timespans in df2? The result 
> (e.g. "no" or "yes" or 0 and 1) should be shown in a new column of df1
> Here is the code to create the two data frames (the size of the two data 
> frames is approx. the same as in my original data frames):
> # create data frame df1
> ti1 <- seq.POSIXt(from=as.POSIXct("2020/01/01", tz="UTC"), 
> to=as.POSIXct("2020/06/01", tz="UTC"), by="10 min")
> df1 <- data.frame(Time=ti1)
> # create data frame df2 with random timespans, i.e. start and end dates
> start <- sort(sample(seq(as.POSIXct("2020/01/01", tz="UTC"), 
> as.POSIXct("2020/06/01", tz="UTC"), by="1 mins"), 5000))
> end   <- start + 120
> df2 <- data.frame(start=start, end=end)
> Everything I tried (ifelse combined with sapply or for loops) has been 
> very very very slow. Thus, I am looking for a reasonably fast solution.
> Thanks a lot for any hint in advance !
> Cheers,
> Thomas
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

More information about the R-help mailing list