[R] Combine two dataframe with different row number and interpolation between values

Ebert,Timothy Aaron tebert @end|ng |rom u||@edu
Wed Aug 31 14:47:11 CEST 2022

Can I interest you in the join functions in dplyr? https://www.datasciencemadesimple.com/join-in-r-merge-in-r/

Filling in missing data is a useful practice when the fake (simulated) data is a small proportion of all data. When 2/3 of the data is fake one must wonder if anything based on those numbers is real or an artifact of assumptions made to generate the numbers.

I deal with weather data and some weather stations are set up to average measurements between recorded values and others take a single reading at regular intervals. How you interpolate might depend on which option describes your data. The alternative is to use the method for recording ws and apply it to the other data that will be merged with these data. I assume there is more data, otherwise I see little point in expanding these values out.


-----Original Message-----
From: R-help <r-help-bounces using r-project.org> On Behalf Of javad bayat
Sent: Wednesday, August 31, 2022 2:09 AM
To: r-help using r-project.org
Subject: [R] Combine two dataframe with different row number and interpolation between values

[External Email]

 Dear all,
I am trying to combine two large dataframe in order to make a dataframe with exactly the dimension of the second dataframe.
The first df is as follows:

df1 = data.frame(y = rep(c(2010,2011,2012,2013,2014), each = 2920), d = rep(c(1:365,1:365,1:365,1:365,1:365),each=8),
      h = rep(c(seq(3,24, by = 3),seq(3,24, by = 3),seq(3,24, by = 3),seq(3,24, by = 3),seq(3,24, by = 3)),365),
      ws = rnorm(1:14600, mean=20))
> head(df1)
     y       d   h        ws
1  2010  1  3     20.71488
2  2010  1  6     19.70125
3  2010  1  9     21.00180
4  2010  1 12     20.29236
5  2010  1 15     20.12317
6  2010  1 18     19.47782

The data in the "ws" column were measured with 3 hours frequency and I need data with one hour frequency. I have made a second df as follows with one hour frequency for the "ws" column.

df2 = data.frame(y = rep(c(2010,2011,2012,2013,2014), each = 8760), d = rep(c(1:365,1:365,1:365,1:365,1:365),each=24),
      h = rep(c(1:24,1:24,1:24,1:24,1:24),365), ws = "NA")
> head(df2)
      y      d    h   ws
1  2010  1    1   NA
2  2010  1    2   NA
3  2010  1    3   NA
4  2010  1    4   NA
5  2010  1    5   NA
6  2010  1    6   NA

What I am trying to do is combine these two dataframes so as to the rows in
df1 (based on the values of "y", "d", "h" columns) that have values exactly similar to df2's rows copied in its place in the new df (df3).
For example, in the first dataframe the first row was measured at 3 o'clock on the first day of 2010 and this row must be placed on the third row of the second dataframe which has a similar value (2010, 1, 3). Like the below
      y      d    h   ws
1  2010  1    1   NA
2  2010  1    2   NA
3  2010  1    3   20.71488
4  2010  1    4   NA
5  2010  1    5   NA
6  2010  1    6   19.70125

But regarding the values of the "ws" column for df2 that do not have value (at 4 and 5 o'clock), I need to interpolate between the before and after values to fill in the missing data of the "ws".
I have tried the following codes but they did not work correctly.

> df3 = merge(df1, df2, by = "y")
Error: cannot allocate vector of size 487.9 Mb or
> library(dplyr)
> df3<- df1%>% full_join(df2)

Is there any way to do this?

Best Regards
Javad Bayat
M.Sc. Environment Engineering
Alternative Mail: bayat194 using yahoo.com

        [[alternative HTML version deleted]]

R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.html&data=05%7C01%7Ctebert%40ufl.edu%7C9e63e590cb834ddc23d908da8b1802e2%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C637975232519465332%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=m1VCPBpnYy%2FwqlOuf5froUVEMBDJKwDuAWS4cFNx1wI%3D&reserved=0
and provide commented, minimal, self-contained, reproducible code.

More information about the R-help mailing list