[R] Merging dataframes

Wed May 2 13:23:52 CEST 2018

Hi,

I'll coded your example into R code:

Table_A <- c('abc using gmail.com', 'John Chan', '0909')
Table_A <- rbind(Table_A, c('bcd using yahoo.com', 'Tim Ma', '89089'))
colnames(Table_A) <- c('Email', 'Name', 'Phone')
Table_A

Table_B <- c('abc using gmail.com', 'John Chan', 'M', '0909')
Table_B <- rbind(Table_B, c('khn using hotmail.com', 'Rosy  Kim', 'F', '7779'))
colnames(Table_B) <- c('Email', 'Name', 'Sex', 'Phone')
Table_B

Did you have a look at this one?
Table_C <- merge (Table_A, Table_B, by="Email", all = TRUE)
Table_C[is.na(Table_C$Name.y),]
Table_C[is.na(Table_C$Name.x),]

Table_C contains all data from Table_A and Table_B. The key.x is NA if
the row comes from Table_B and key.y is NA if the row comes from Table_A.

Best, Robin

On 05/02/2018 11:38 AM, Chintanu wrote:
> Thanks - Peter, Eivind, Rui
> 
> 
> Sorry, I perhaps could not explain it properly in the first go.
> 
> Trying to simplify it here with an example - Say I have two dataframes as
> below that are NOT equally-sized data frames (i.e., number of columns are
> different in each table):
> 
> 
> 
> Table_A:
> 
> Email             Name                   Phone
> 
> abc using gmail.com   John Chan         0909
> 
> bcd using yahoo.com   Tim Ma                    89089
> 
> ......
> 
> 
> 
> Table_B:
> 
> Email                     Name                 Sex        Phone
> 
> abc using gmail.com    John Chan            M                 0909
> 
> khn using hotmail.com           Rosy  Kim               F                   7779
> 
> .....
> 
> 
> 
> Now, I have used -
> 
> merge (Table_A, Table_B, by="Email", all = FALSE))
> 
> 
> 
> - to find only the rows that match from these data frames - based on Email
> as primary key.
> 
> 
> 
> Further, I am also interested (using "Email" as the common key) which rows
> from Table_A did not match with Table_B.
> 
> I am not sure how to do this here.
> 
> Thanks and regards,
> Chintanu
> 
> 
> 
> On Tue, May 1, 2018 at 8:48 PM, Rui Barradas <ruipbarradas using sapo.pt> wrote:
> 
>> Hello,
>>
>> Is it something like this that you want?
>>
>> x <- data.frame(a = c(1:3, 5, 5:10), b = c(1:7, 7, 9:10))
>> y <- data.frame(a = 1:10, b = 1:10)
>>
>> which(x != y, arr.ind = TRUE)
>>
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>>
>> On 5/1/2018 11:35 AM, Chintanu wrote:
>>
>>> Hi,
>>>
>>>
>>> May I please ask how I do the following in R. Sorry - this may be trivial,
>>> but I am struggling here for this.
>>>
>>>
>>>
>>> For two dataframes (A and B), I wish to identify (based on a primary
>>> key-column present in both A & B) -
>>>
>>> 1. Which records (rows) of A did not match with B, and
>>>
>>>
>>>
>>> 2. Which records of B did not match with A ?
>>>
>>>
>>>
>>> I came across a setdt function while browsing, but when I tried it, it
>>> says
>>> - Could not find function "setdt".
>>>
>>>
>>>
>>> Overall, if there is any way of doing it (preferably in some simplified
>>> way), please advise.
>>>
>>>
>>> Many thanks in advance.
>>>
>>>
>>> regards,
>>>
>>> Tito
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posti
>>> ng-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

-- 
Dr. Robin Haunschild
Max Planck Institute
for Solid State Research
Heisenbergstr. 1
D-70569 Stuttgart (Germany)
phone: +49 (0) 711-689-1285
fax:   +49 (0) 711-689-1292
email: R.Haunschild using fkf.mpg.de
http://www.fkf.mpg.de/ivs