[R] Joining data frames

Matthew McCormack mccorm@ck @end|ng |rom mo|b|o@mgh@h@rv@rd@edu
Wed Jun 30 02:03:13 CEST 2021


     I think, but I'm not sure, that when you use merge it basically 
attaches one data frame to the other. I do not think it matches up 
entries from a particular column in each data frame (and I know 
biologists frequently want to match entries from a particular column in 
each data frame). For that, I think you need a join from the dplyr package.

    If you do a right join, then it will use only the entries from the 
second df (the data frame to the right, df1). Entries in df, that are 
not in df1 will not be in the final (in your example the final is df). 
So, from you code, you took df and then joined it to where it had 
entries in df1 and changed df to contain only entries in df that were in 
df1. Had you done a left_join, then your final data frame, df, would 
contain only those entries found originally in df and df1 (entries in 
df1, but not in df would be excluded in the final df).

    You could do a full_join and then all entries (entries in both data 
frames, entries in df but not in df1, and entries in df1 but not in df) 
will be in the final. Maybe something like : (In this case I have 
created a new data frame, df_final, but you could still go with just 
changing df.)

df_final<- full_join(df, df1, by = c(“Sample”, "Plot"))

Matthew


On 6/29/21 7:15 PM, Jim Lemon wrote:
>          External Email - Use Caution
>
> Hi Esthi,
> Have you tried something like:
>
> df2<-merge(df,df1,by.x="Sample",by.y="Plot",all.y=TRUE)
>
> This will get you a right join in "df2", not overwriting "df".
>
> Jim
>
> On Wed, Jun 30, 2021 at 1:13 AM Esthi Erickson <ericksonesthi using gmail.com> wrote:
>> Hi and thank you in advance,
>>
>> If I have a dataframe, df:
>>
>> Sample
>>
>> Plot
>>
>> Biomass
>>
>> 1
>>
>> 1
>>
>> 1024
>>
>> 1
>>
>> 2
>>
>> 32
>>
>> 2
>>
>> 3
>>
>> 223
>>
>> 2
>>
>> 4
>>
>> 456
>>
>> 3
>>
>> 1
>>
>>
>> 3
>>
>> 2
>>
>> 331
>>
>> 3
>>
>> 3
>>
>> 22151
>>
>> 3
>>
>> 4
>>
>> 1441
>>
>> And another one, df1:
>>
>> Sample
>>
>> Plot
>>
>> % cover of plant1
>>
>> % cover of plant2
>>
>> 3
>>
>> 1
>>
>> 32
>>
>> 63
>>
>> 3
>>
>> 2
>>
>> 3
>>
>>
>> 3
>>
>> 3
>>
>>
>> 3
>>
>> 3
>>
>> 4
>>
>> 5
>>
>> 23
>>
>> I want to join these tables where the columns Sample and Plot are the same.
>>
>> Currently trying:
>>
>> df<- right_join(df, df1, by = c(“Sample”, "Plot"))
>>
>> I am working with a much larger dataset, but it will cut off the data
>> starting at Sample 3 instead of joining the tables while retaining the
>> information from df. Any ideas how I could join them this way?
>>
>>
>> Esthi
>>
>>          [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://secure-web.cisco.com/1NdijE3bwtTnJ0kSEgtJU1NlrtOK9zEfac9zyeZv87EuBW5RBFz3d1rdtVoxuWjjEZm2ILfmP1KOs1kEsAOECi2THQ-_HKB9EOJWeI57gQdy8H3UbdNo5_jjkMLPJ7OWuokUT-FJwD84kR0uptsG7XUn_xN9NkAZ4ESV6jXCMs_vWVuqkvXkPRfDV0BBMBQWLKxiQKz-9GYTrcqzWGsCc_A1LB3p6YBnMcOeElnau9pAicwrSrzqbNayjDWgW75J91dn1Bpb7rhV4xLELl_KS0g/https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help
>> PLEASE do read the posting guide http://secure-web.cisco.com/1yHZkzpQUOKRg8cDG22MQDxPOC13uXEOgchugGyn3LgrkzeHEY3bJmUM7BdgniFNPIUlVK9c26rAxELBoKzCk3QtR375fxo8PTFptWSOByZg9wWZw8ounbb3NvkgZApJHaDn6KCFRf4ym05BIQUG039oUDbsdBh6fa5LNBsdgTIGVetQokelOMdncVxIv_g233z1CF1xfAozJ9-8eetgqhSIh1lRMlheHhpVRDzkSbxAij8APSko49XhpHmsqwOevGN0c3vHgLT2dLLAzvO_ZLA/http%3A%2F%2Fwww.R-project.org%2Fposting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://secure-web.cisco.com/1NdijE3bwtTnJ0kSEgtJU1NlrtOK9zEfac9zyeZv87EuBW5RBFz3d1rdtVoxuWjjEZm2ILfmP1KOs1kEsAOECi2THQ-_HKB9EOJWeI57gQdy8H3UbdNo5_jjkMLPJ7OWuokUT-FJwD84kR0uptsG7XUn_xN9NkAZ4ESV6jXCMs_vWVuqkvXkPRfDV0BBMBQWLKxiQKz-9GYTrcqzWGsCc_A1LB3p6YBnMcOeElnau9pAicwrSrzqbNayjDWgW75J91dn1Bpb7rhV4xLELl_KS0g/https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help
> PLEASE do read the posting guide http://secure-web.cisco.com/1yHZkzpQUOKRg8cDG22MQDxPOC13uXEOgchugGyn3LgrkzeHEY3bJmUM7BdgniFNPIUlVK9c26rAxELBoKzCk3QtR375fxo8PTFptWSOByZg9wWZw8ounbb3NvkgZApJHaDn6KCFRf4ym05BIQUG039oUDbsdBh6fa5LNBsdgTIGVetQokelOMdncVxIv_g233z1CF1xfAozJ9-8eetgqhSIh1lRMlheHhpVRDzkSbxAij8APSko49XhpHmsqwOevGN0c3vHgLT2dLLAzvO_ZLA/http%3A%2F%2Fwww.R-project.org%2Fposting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list