[R] Merging dataframes
Dr. Robin Haunschild
R@H@un@ch||d @end|ng |rom |k|@mpg@de
Wed May 2 13:23:52 CEST 2018
Hi,
I'll coded your example into R code:
Table_A <- c('abc using gmail.com', 'John Chan', '0909')
Table_A <- rbind(Table_A, c('bcd using yahoo.com', 'Tim Ma', '89089'))
colnames(Table_A) <- c('Email', 'Name', 'Phone')
Table_A
Table_B <- c('abc using gmail.com', 'John Chan', 'M', '0909')
Table_B <- rbind(Table_B, c('khn using hotmail.com', 'Rosy Kim', 'F', '7779'))
colnames(Table_B) <- c('Email', 'Name', 'Sex', 'Phone')
Table_B
Did you have a look at this one?
Table_C <- merge (Table_A, Table_B, by="Email", all = TRUE)
Table_C[is.na(Table_C$Name.y),]
Table_C[is.na(Table_C$Name.x),]
Table_C contains all data from Table_A and Table_B. The key.x is NA if
the row comes from Table_B and key.y is NA if the row comes from Table_A.
Best, Robin
On 05/02/2018 11:38 AM, Chintanu wrote:
> Thanks - Peter, Eivind, Rui
>
>
> Sorry, I perhaps could not explain it properly in the first go.
>
> Trying to simplify it here with an example - Say I have two dataframes as
> below that are NOT equally-sized data frames (i.e., number of columns are
> different in each table):
>
>
>
> Table_A:
>
> Email Name Phone
>
> abc using gmail.com John Chan 0909
>
> bcd using yahoo.com Tim Ma 89089
>
> ......
>
>
>
> Table_B:
>
> Email Name Sex Phone
>
> abc using gmail.com John Chan M 0909
>
> khn using hotmail.com Rosy Kim F 7779
>
> .....
>
>
>
> Now, I have used -
>
> merge (Table_A, Table_B, by="Email", all = FALSE))
>
>
>
> - to find only the rows that match from these data frames - based on Email
> as primary key.
>
>
>
> Further, I am also interested (using "Email" as the common key) which rows
> from Table_A did not match with Table_B.
>
> I am not sure how to do this here.
>
> Thanks and regards,
> Chintanu
>
>
>
> On Tue, May 1, 2018 at 8:48 PM, Rui Barradas <ruipbarradas using sapo.pt> wrote:
>
>> Hello,
>>
>> Is it something like this that you want?
>>
>> x <- data.frame(a = c(1:3, 5, 5:10), b = c(1:7, 7, 9:10))
>> y <- data.frame(a = 1:10, b = 1:10)
>>
>> which(x != y, arr.ind = TRUE)
>>
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>>
>> On 5/1/2018 11:35 AM, Chintanu wrote:
>>
>>> Hi,
>>>
>>>
>>> May I please ask how I do the following in R. Sorry - this may be trivial,
>>> but I am struggling here for this.
>>>
>>>
>>>
>>> For two dataframes (A and B), I wish to identify (based on a primary
>>> key-column present in both A & B) -
>>>
>>> 1. Which records (rows) of A did not match with B, and
>>>
>>>
>>>
>>> 2. Which records of B did not match with A ?
>>>
>>>
>>>
>>> I came across a setdt function while browsing, but when I tried it, it
>>> says
>>> - Could not find function "setdt".
>>>
>>>
>>>
>>> Overall, if there is any way of doing it (preferably in some simplified
>>> way), please advise.
>>>
>>>
>>> Many thanks in advance.
>>>
>>>
>>> regards,
>>>
>>> Tito
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posti
>>> ng-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Dr. Robin Haunschild
Max Planck Institute
for Solid State Research
Heisenbergstr. 1
D-70569 Stuttgart (Germany)
phone: +49 (0) 711-689-1285
fax: +49 (0) 711-689-1292
email: R.Haunschild using fkf.mpg.de
http://www.fkf.mpg.de/ivs
More information about the R-help
mailing list