[R] join/merge two data frames

Arnaud Gaboury @rn@ud@g@boury @end|ng |rom gm@||@com
Wed Apr 2 20:10:24 CEST 2025



On Wed, 2025-04-02 at 16:55 +0000, Ebert,Timothy Aaron wrote:
> Your result data frame example makes no sense to me. The price and
> executed_qty are the same for all symbols?
> 
> To get it all into one data frame you need a common variable that is
> used to join the data frames.
> My guess is that all_trade_sample$symbol has equivalents to the
> variables in token_close_sample.
> 
> You need to pivot one of the tables, and then full join them.
> Please look at the join functions. In base R it is "merge()" and in
> dyply it is full_join(), left_join() and similar.
> 
> I will pivot all_trade_sample to make all of the elements in "symbol"
> into variables.
> 
> I will do this in tidyverse. There are some parts of this that I
> really like. However, it can as easily be done using base R.
> Library(tidyr)
> Library(dplyr)
> token_close_long <- token_close_sample %>%
>   pivot_longer(cols = -time, names_to = "symbol", values_to =
> "close_price")
> #I rename time in token_close_long so that it will be preserved
> colnames(token_close_long) <- c("time_close", "symbol",
> "close_price")
> combined <- full_join(all_trade_sample,token_close_long, by="symbol")
> 
> This generates errors. This is ok, but I need to know more about
> all_trade_sample. Each symbol appears multiple times in
> all_trade_sample. Each instance of "AAVEUSIX" has a different time
> stamp. So maybe filter to eliminate all but the first (earliest)
> time?
> 
> The program looks something like this:
> token_close_long <- token_close_sample %>%
>   pivot_longer(cols = -time, names_to = "symbol", values_to =
> "close_price")
> 
> #I will rename time in token_close_long so that it will be preserved
> colnames(token_close_long) <- c("time_close", "symbol",
> "close_price")
> combined <- full_join(all_trade_sample,token_close_long, by="symbol")
> 
> filtered_trades <- all_trade_sample |>
>   group_by(symbol) |>
>   slice_min(time, n = 1)|>
>   ungroup()
> 
> combined <- full_join(filtered_trades,token_close_long, by="symbol")
> 
> I did not do something right, as there are four closing prices for
> each symbol. However, the general approach should work even if it
> needs a little modification to give the correct result. Your closing
> price is only relevant based on some other price (probably buying
> price, but could be opening price).
> 
> Tim

Thank you Tim for your answer. I will have a close look at it later
today. But in short:
- I will do some work on my data later in the script. The tibble I want
is far from being the result.
- Time of closing price is everyday the same: 18:59:59. I will compute
returns, volatility, correlations etc so I need a common reference.
Prices for the trades change and are given by the exchange. I don't
need them, won't do anything.
- Yes, symbols may be the common variable. 
- I feel comfortable with tibbles and tidyverse
- All_trade_sample has the goal to fetch price (price of asset when the
trade is done) and executed_qty. btc_price is needed at one point of my
calculation, but will not be shown in the very final table.
- Multiple lines for one only symbol means I made multiple trades of
this asset during the same day.

I will test your suggestions.
Thank you again.


> 
> -----Original Message-----
> From: R-help <r-help-bounces using r-project.org> On Behalf Of Arnaud
> Gaboury
> Sent: Wednesday, April 2, 2025 6:20 AM
> To: r-help using r-project.org
> Subject: [R] join/merge two data frames
> 
> [External Email]
> 
> I work on a trading journal for a portfolio of crypto currencies. The
> goal is to fetch from my account (binance exchange) the trades I have
> done and daily closing prices of my assets.
> The first part (getting the data from exchange) are in two parts.
> 
> 1- get the daily closing prices of my assets. Here is a sample of my
> data frame:
> token_close_sample <- structure(list(time =
> structure(c(1734371999.999, 1734458399.999, 1734544799.999,
> 1734631199.999), tzone = "", class = c("POSIXct", "POSIXt")), BTCUSDC
> = c(107112.36, 107517.25, 104630.49, 98692.01 ), SUIUSDC = c(4.7252,
> 4.6923, 4.7017, 4.2422), ENAUSDC = c(1.1862, 1.1412, 1.0928, 1.0256),
> AAVEUSDC = c(388, 365.68, 373.15, 316.69 ), ETHUSDC = c(4034.74,
> 3975.39, 3879.06, 3474.91), FTMUSDC = c(1.381, 1.3596, 1.2222,
> 1.0445)), row.names = c(NA, -4L), class = c("tbl_df", "tbl",
> "data.frame"))
> 
> the tibble looks like this:
> time                BTCUSDC SUIUSDC ENAUSDC AAVEUSDC ETHUSDC FTMUSDC
> <dttm>                <dbl>   <dbl>   <dbl>    <dbl>   <dbl>   <dbl>
> 1 2024-12-16 18:59:59 107112.    4.73    1.19     388    4035.   
> 1.38
> 2 2024-12-17 18:59:59 107517.    4.69    1.14     366.   3975.   
> 1.36
> 3 2024-12-18 18:59:59 104630.    4.70    1.09     373.   3879.   
> 1.22
> 4 2024-12-19 18:59:59  98692.    4.24    1.03     317.   3475.   
> 1.04
> 
> 2- get my trades. Here is a sample:
> all_trade_sample <- structure(list(time = structure(c(1737335082.949,
> 1737336735.697, 1738059550.671, 1738142709.422, 1738142709.422,
> 1738169351.788 ), tzone = "UTC", class = c("POSIXct", "POSIXt")),
> symbol = c("AAVEUSDC", "AAVEUSDC", "SUIUSDC", "AAVEUSDC", "AAVEUSDC",
> "ETHUSDC"), executed_qty = c(866.666, -834.998, 67649.3, -0.393, -
> 0.393, 36.1158), price = c(0.003005, 0.003131, 0.000038, 294.738321,
> 294.738321, 0.03027), cummulative_quote_qty = c(262699.317950113, -
> 263696.723173419, 263987.20719179, -115.83216, -115.83216,
> 111456.491386979
> ), day = structure(c(20108, 20108, 20116, 20117, 20117, 20117 ),
> class = "Date")), row.names = c(NA, -6L), class = c("tbl_df", "tbl",
> "data.frame"))
> 
> the tibble looks like this:
> time                symbol   executed_qty      price
> cummulative_quote_qty day
> <dttm>              <chr>           <dbl>      <dbl>
> <dbl> <date>
> 1 2025-01-20 01:04:42 AAVEUSDC      867.      0.00300
> 262699. 2025-01-20
> 2 2025-01-20 01:32:15 AAVEUSDC     -835.      0.00313               -
> 263697. 2025-01-20
> 3 2025-01-28 10:19:10 SUIUSDC     67649.      0.000038
> 263987. 2025-01-28
> 4 2025-01-29 09:25:09 AAVEUSDC       -0.393 295.
> -116. 2025-01-29
> 5 2025-01-29 09:25:09 AAVEUSDC       -0.393 295.
> -116. 2025-01-29
> 6 2025-01-29 16:49:11 ETHUSDC        36.1     0.0303
> 111456. 2025-01-29
> 
> Now, to finalize, I want to get all the info in one data frame so I
> can compute daily valuation (with potential trades, or in/out of
> asset).
> The finalized tibble should look something like this:
> result <- structure(list(time = structure(c(1734371999.999,
> 1734458399.999, 1734544799.999, 1734631199.999, 1737335082.949,
> 1737336735.697, 1738059550.671, 1734721199, 1734807599, 1734893999,
> 1734980399, 1735066799, 1735153199, 1735239599, 1735325999,
> 1738142709.422, 1735412399, 1738142709.422, 1738169351.788,
> 1735498799), tzone = "", class = c("POSIXct", "POSIXt")), BTCUSDC =
> c(107112.36, 107517.25, 104630.49, 98692.01,NA_real_, NA_real_,
> 102000, 101500, 101700,100300,100400,102300,102300,103100, NA_real_,
> 99800, NA_real_, NA_real_,NA_real_, 99900 ), SUIUSDC = c(4.7252,
> 4.6923, 4.7017, 4.2422, NA_real_, NA_real_, 4.25, 4.26, 4.7, 4.65,
> 4.52, 4.23, 4.17, 4.34, NA_real_, 4.52, NA_real_, NA_real_,NA_real_,
> 4.44), ENAUSDC = c(1.1862, 1.1412, 1.0928, 1.0256, NA_real_,
> NA_real_, 1.176, 1.16, 1.163, 1.183, 1.196, 1.165, 1.158, 1.142,
> NA_real_, 1.196, NA_real_, NA_real_,NA_real_, 1.113), AAVEUSDC =
> c(388, 365.68, 373.15, 316.69, NA_real_, NA_real_, 102000, 101500,
> 101700,100300,100400,102300,102300,103100, NA_real_, 99800, NA_real_,
> NA_real_,NA_real_, 99900 ), ETHUSDC = c(4034.74, 3975.39, 3879.06,
> 3474.91, NA_real_,NA_real_, 3420, 3410, 3412, 3367, 3388, 3355, 3374,
> 3392, NA_real_, 3401, NA_real_, NA_real_,NA_real_, 3411), FTMUSDC =
> c(1.381,1.3596, 1.2222, 1.0445, NA_real_,NA_real_, 1.36, 1.368,
> 1.342, 1.339, 1.436, 1.562, 1.53, 1.62, NA_real_, 1.31, NA_real_,
> NA_real_,NA_real_, 1.58), executed_qty = c(NA_real_, NA_real_,
> NA_real_, NA_real_, 230, 559, NA_real_, NA_real_,NA_real_, NA_real_,
> NA_real_,NA_real_, NA_real_, NA_real_, 200, NA_real_, 700, 200, 350,
> NA_real_), price = c( NA_real_, NA_real_, NA_real_, NA_real_, 2, 3,
> NA_real_, NA_real_,NA_real_, NA_real_, NA_real_,NA_real_, NA_real_,
> NA_real_, 10, NA_real_, 3, 4, 5, NA_real_) ), row.names = c(NA, -
> 20L), class = c("tbl_df", "tbl", "data.frame"))
> 
> The idea is to have:
> - one line each day with closing price of my portfolio assets at
> 18:59:59
> - one line for each trade where I shall write 'asset', 'price' and
> 'executed_qty'.
> 
> I spent quite some time trying to figure out how to do it, but
> couldn't. I don't even know if it is possible. Of course, I don't
> want to add anything (price, date, asset...) by hand.
> Thank you for any help.
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: This is a digitally signed message part
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20250402/3a1d0d24/attachment.sig>


More information about the R-help mailing list