[R] Merge with closest (not equal) time stamps

Enrico Schumann e@ @end|ng |rom enr|co@chum@nn@net
Tue Aug 8 13:34:17 CEST 2023


On Mon, 07 Aug 2023, Naresh Gurbuxani writes:

> I have two dataframes, each with a column for timestamp.  I want to
> merge the two dataframes such that each row from first dataframe
> is matched with the row in the second dataframe with most recent but
> preceding timestamp. Here is an example.
>
> option.trades <- data.frame(timestamp = as.POSIXct(c("2023-08-07 10:23:22", "2023-08-07 10:25:33", "2023-08-07 10:28:41")), option.price = c(2.5, 2.7, 1.8))
>
> stock.trades <- data.frame(timestamp =
> as.POSIXct(c("2023-08-07 10:23:21", "2023-08-07
> 10:23:34", "2023-08-07 10:24:57", "2023-08-07
> 10:28:37", "2023-08-07 10:29:01")), stock.price =
> c(102.2, 102.9, 103.1, 101.8, 101.7))
>
> stock.trades <- stock.trades[order(stock.trades$timestamp),]
>
> library(plyr)
> mystock.prices <- ldply(option.trades$timestamp, function(tstamp) tail(subset(stock.trades, timestamp <= tstamp), 1))
> names(mystock.prices)[1] <- "stock.timestamp"
> myres <- cbind(option.trades, mystock.prices)
>
> This method works. But for large dataframes, it is very slow.  Is there
> a way to speed up the merge?
>
> Thanks,
> Naresh
>

If the timestamps are sorted (or you can sort them),
function ?findInterval might be helpful:

    i <- findInterval(option.trades$timestamp, stock.trades$timestamp)
    cbind(option.trades, stock.trades[i, ])
    ##             timestamp option.price           timestamp stock.price
    ## 1 2023-08-07 10:23:22          2.5 2023-08-07 10:23:21       102.2
    ## 3 2023-08-07 10:25:33          2.7 2023-08-07 10:24:57       103.1
    ## 4 2023-08-07 10:28:41          1.8 2023-08-07 10:28:37       101.8



-- 
Enrico Schumann
Lucerne, Switzerland
http://enricoschumann.net



More information about the R-help mailing list