[R] Merging dataframes

Thierry Onkelinx th|erry@onke||nx @end|ng |rom |nbo@be
Wed May 2 14:01:59 CEST 2018


Have a look at anti_join() from the dplyr package. It does exactly
what you want. Here is an example based on the code of Robin

Table_A <- as.data.frame(Table_A, stringsAsFactors = FALSE)That is
Table_B <- as.data.frame(Table_B, stringsAsFactors = FALSE)
library(dplyr)
anti_join(Table_A, Table_B, by = "Email")
anti_join(Table_B, Table_A, by = "Email")

Best regards,

ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE
AND FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry.onkelinx using inbo.be
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no
more than asking him to perform a post-mortem examination: he may be
able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does
not ensure that a reasonable answer can be extracted from a given body
of data. ~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////




2018-05-02 13:23 GMT+02:00 Dr. Robin Haunschild <R.Haunschild using fkf.mpg.de>:
> Hi,
>
> I'll coded your example into R code:
>
> Table_A <- c('abc using gmail.com', 'John Chan', '0909')
> Table_A <- rbind(Table_A, c('bcd using yahoo.com', 'Tim Ma', '89089'))
> colnames(Table_A) <- c('Email', 'Name', 'Phone')
> Table_A
>
> Table_B <- c('abc using gmail.com', 'John Chan', 'M', '0909')
> Table_B <- rbind(Table_B, c('khn using hotmail.com', 'Rosy  Kim', 'F', '7779'))
> colnames(Table_B) <- c('Email', 'Name', 'Sex', 'Phone')
> Table_B
>
> Did you have a look at this one?
> Table_C <- merge (Table_A, Table_B, by="Email", all = TRUE)
> Table_C[is.na(Table_C$Name.y),]
> Table_C[is.na(Table_C$Name.x),]
>
> Table_C contains all data from Table_A and Table_B. The key.x is NA if
> the row comes from Table_B and key.y is NA if the row comes from Table_A.
>
> Best, Robin
>
>
> On 05/02/2018 11:38 AM, Chintanu wrote:
>> Thanks - Peter, Eivind, Rui
>>
>>
>> Sorry, I perhaps could not explain it properly in the first go.
>>
>> Trying to simplify it here with an example - Say I have two dataframes as
>> below that are NOT equally-sized data frames (i.e., number of columns are
>> different in each table):
>>
>>
>>
>> Table_A:
>>
>> Email             Name                   Phone
>>
>> abc using gmail.com   John Chan         0909
>>
>> bcd using yahoo.com   Tim Ma                    89089
>>
>> ......
>>
>>
>>
>> Table_B:
>>
>> Email                     Name                 Sex        Phone
>>
>> abc using gmail.com    John Chan            M                 0909
>>
>> khn using hotmail.com           Rosy  Kim               F                   7779
>>
>> .....
>>
>>
>>
>> Now, I have used -
>>
>> merge (Table_A, Table_B, by="Email", all = FALSE))
>>
>>
>>
>> - to find only the rows that match from these data frames - based on Email
>> as primary key.
>>
>>
>>
>> Further, I am also interested (using "Email" as the common key) which rows
>> from Table_A did not match with Table_B.
>>
>> I am not sure how to do this here.
>>
>> Thanks and regards,
>> Chintanu
>>
>>
>>
>> On Tue, May 1, 2018 at 8:48 PM, Rui Barradas <ruipbarradas using sapo.pt> wrote:
>>
>>> Hello,
>>>
>>> Is it something like this that you want?
>>>
>>> x <- data.frame(a = c(1:3, 5, 5:10), b = c(1:7, 7, 9:10))
>>> y <- data.frame(a = 1:10, b = 1:10)
>>>
>>> which(x != y, arr.ind = TRUE)
>>>
>>>
>>> Hope this helps,
>>>
>>> Rui Barradas
>>>
>>>
>>> On 5/1/2018 11:35 AM, Chintanu wrote:
>>>
>>>> Hi,
>>>>
>>>>
>>>> May I please ask how I do the following in R. Sorry - this may be trivial,
>>>> but I am struggling here for this.
>>>>
>>>>
>>>>
>>>> For two dataframes (A and B), I wish to identify (based on a primary
>>>> key-column present in both A & B) -
>>>>
>>>> 1. Which records (rows) of A did not match with B, and
>>>>
>>>>
>>>>
>>>> 2. Which records of B did not match with A ?
>>>>
>>>>
>>>>
>>>> I came across a setdt function while browsing, but when I tried it, it
>>>> says
>>>> - Could not find function "setdt".
>>>>
>>>>
>>>>
>>>> Overall, if there is any way of doing it (preferably in some simplified
>>>> way), please advise.
>>>>
>>>>
>>>> Many thanks in advance.
>>>>
>>>>
>>>> regards,
>>>>
>>>> Tito
>>>>
>>>>         [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posti
>>>> ng-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>
>>       [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> --
> Dr. Robin Haunschild
> Max Planck Institute
> for Solid State Research
> Heisenbergstr. 1
> D-70569 Stuttgart (Germany)
> phone: +49 (0) 711-689-1285
> fax:   +49 (0) 711-689-1292
> email: R.Haunschild using fkf.mpg.de
> http://www.fkf.mpg.de/ivs
>
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>




More information about the R-help mailing list