[R] lubridate and intervals

Wed Aug 31 02:29:45 CEST 2011

There are 10 overlaps in the data:

> df1<-data.frame(start=as.POSIXct(paste('2011-06-01 ',1:20,':00',sep='')),
+ end=as.POSIXct(paste('2011-06-01 ',1:20,':30',sep='')))
> df2<-data.frame(start=as.POSIXct(paste('2011-06-01
+ ',rep(seq(1,20,2),2),':',sample(1:19,20,replace=T),sep='')),
+ end=as.POSIXct(paste('2011-06-01
+ ',rep(seq(1,20,2),2),':',sample(20:50,20),sep='')))
>
> # create a matrix where the 'start' adds 1 to a count and the 'end' subtracts 1
> # the second column is the df# and the 4th is the row number of the data
>
> x <- rbind(
+     cbind(df1$start, 1, 1, seq(nrow(df1))),
+     cbind(df1$end, 1, -1, seq(nrow(df1))),
+     cbind(df2$start, 2, 1, seq(nrow(df2))),
+     cbind(df2$end, 2, -1, seq(nrow(df2)))
+     )
> # sort by time
> x <- x[order(x[,1]),]
> # add the queue count; this is the number of items in a queue which is
> # used to determine any overlaps if the queue is greater than one
> x <- cbind(x, count = cumsum(x[,3]))
> # split the data into group when the count == 0
> indx <- split(seq(nrow(x)), cumsum(c(FALSE, head(x[, 'count'], -1) == 0)))
> # keep groups of length > 2; there are the overlaps
> indx <- indx[sapply(indx, length) > 2]
> # get unique df# and row indices
> lapply(indx, function(a){
+     unique(paste(x[a, 2], x[a, 4], sep = ' - '))
+ })
$`0`
[1] "1 - 1"  "2 - 11" "2 - 1"

$`2`
[1] "1 - 3"  "2 - 12" "2 - 2"

$`4`
[1] "1 - 5"  "2 - 13" "2 - 3"

$`6`
[1] "1 - 7"  "2 - 14" "2 - 4"

$`8`
[1] "1 - 9"  "2 - 15" "2 - 5"

$`10`
[1] "1 - 11" "2 - 16" "2 - 6"

$`12`
[1] "1 - 13" "2 - 17" "2 - 7"

$`14`
[1] "1 - 15" "2 - 8"  "2 - 18"

$`16`
[1] "1 - 17" "2 - 19" "2 - 9"

$`18`
[1] "1 - 19" "2 - 20" "2 - 10"

On Tue, Aug 30, 2011 at 2:15 PM, Justin Haynes <jtor14 at gmail.com> wrote:
> Hiya,
>
> maybe there is a native R function for this and if so please let me know!
>
> I have 2 data.frames with start and end dates, they read in as strings and I
> am converting to POSIXct.  How can I check for overlap?
>
> The end result ideally will be a single data.frame containing all the
> columns of the other two with rows where there were date overlaps.
>
>
> df1<-data.frame(start=as.POSIXct(paste('2011-06-01 ',1:20,':00',sep='')),
> end=as.POSIXct(paste('2011-06-01 ',1:20,':30',sep='')))
> df2<-data.frame(start=as.POSIXct(paste('2011-06-01
> ',rep(seq(1,20,2),2),':',sample(1:19,20,replace=T),sep='')),
> end=as.POSIXct(paste('2011-06-01
> ',rep(seq(1,20,2),2),':',sample(20:50,20),sep='')))
>
> I tried:
> library(lubridate)
>
> df1$interval<-new_interval(df1$start,df1$end)
>
>> df1$interval[1]
> [1] 2011-06-01 01:00:00 -- 2011-06-01 01:30:00
>> df2$start[1]
> [1] "2011-06-01 01:17:00 PDT"
>
> but
>
>> df2$start[1] %in% df1$interval[1]
> [1] FALSE
>>
>
> This must be fairly straight forward and I just don't know where to look!
>
>
> Thanks,
> Justin
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?