[R] lubridate and intervals
jim holtman
jholtman at gmail.com
Wed Aug 31 02:29:45 CEST 2011
There are 10 overlaps in the data:
> df1<-data.frame(start=as.POSIXct(paste('2011-06-01 ',1:20,':00',sep='')),
+ end=as.POSIXct(paste('2011-06-01 ',1:20,':30',sep='')))
> df2<-data.frame(start=as.POSIXct(paste('2011-06-01
+ ',rep(seq(1,20,2),2),':',sample(1:19,20,replace=T),sep='')),
+ end=as.POSIXct(paste('2011-06-01
+ ',rep(seq(1,20,2),2),':',sample(20:50,20),sep='')))
>
> # create a matrix where the 'start' adds 1 to a count and the 'end' subtracts 1
> # the second column is the df# and the 4th is the row number of the data
>
> x <- rbind(
+ cbind(df1$start, 1, 1, seq(nrow(df1))),
+ cbind(df1$end, 1, -1, seq(nrow(df1))),
+ cbind(df2$start, 2, 1, seq(nrow(df2))),
+ cbind(df2$end, 2, -1, seq(nrow(df2)))
+ )
> # sort by time
> x <- x[order(x[,1]),]
> # add the queue count; this is the number of items in a queue which is
> # used to determine any overlaps if the queue is greater than one
> x <- cbind(x, count = cumsum(x[,3]))
> # split the data into group when the count == 0
> indx <- split(seq(nrow(x)), cumsum(c(FALSE, head(x[, 'count'], -1) == 0)))
> # keep groups of length > 2; there are the overlaps
> indx <- indx[sapply(indx, length) > 2]
> # get unique df# and row indices
> lapply(indx, function(a){
+ unique(paste(x[a, 2], x[a, 4], sep = ' - '))
+ })
$`0`
[1] "1 - 1" "2 - 11" "2 - 1"
$`2`
[1] "1 - 3" "2 - 12" "2 - 2"
$`4`
[1] "1 - 5" "2 - 13" "2 - 3"
$`6`
[1] "1 - 7" "2 - 14" "2 - 4"
$`8`
[1] "1 - 9" "2 - 15" "2 - 5"
$`10`
[1] "1 - 11" "2 - 16" "2 - 6"
$`12`
[1] "1 - 13" "2 - 17" "2 - 7"
$`14`
[1] "1 - 15" "2 - 8" "2 - 18"
$`16`
[1] "1 - 17" "2 - 19" "2 - 9"
$`18`
[1] "1 - 19" "2 - 20" "2 - 10"
On Tue, Aug 30, 2011 at 2:15 PM, Justin Haynes <jtor14 at gmail.com> wrote:
> Hiya,
>
> maybe there is a native R function for this and if so please let me know!
>
> I have 2 data.frames with start and end dates, they read in as strings and I
> am converting to POSIXct. How can I check for overlap?
>
> The end result ideally will be a single data.frame containing all the
> columns of the other two with rows where there were date overlaps.
>
>
> df1<-data.frame(start=as.POSIXct(paste('2011-06-01 ',1:20,':00',sep='')),
> end=as.POSIXct(paste('2011-06-01 ',1:20,':30',sep='')))
> df2<-data.frame(start=as.POSIXct(paste('2011-06-01
> ',rep(seq(1,20,2),2),':',sample(1:19,20,replace=T),sep='')),
> end=as.POSIXct(paste('2011-06-01
> ',rep(seq(1,20,2),2),':',sample(20:50,20),sep='')))
>
> I tried:
> library(lubridate)
>
> df1$interval<-new_interval(df1$start,df1$end)
>
>> df1$interval[1]
> [1] 2011-06-01 01:00:00 -- 2011-06-01 01:30:00
>> df2$start[1]
> [1] "2011-06-01 01:17:00 PDT"
>
> but
>
>> df2$start[1] %in% df1$interval[1]
> [1] FALSE
>>
>
> This must be fairly straight forward and I just don't know where to look!
>
>
> Thanks,
> Justin
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Data Munger Guru
What is the problem that you are trying to solve?
More information about the R-help
mailing list