[R] Merging dataframes

Sarmah, Chintanu Chint@nu@S@rm@h @ending from @i@@com
Wed May 2 06:16:46 CEST 2018


Thanks, Peter, Eivind and Lui

Sorry, I could not explain it properly in the first go. Trying to simplify it here with an example - Say I have two dataframes as below that are not equally-sized data frames:

Table_A:
Email             Name                   Phone
abc at gmail.com<mailto:abc at gmail.com>   John Chan         0909
bcd at yahoo.com<mailto:bcd at yahoo.com>   Tim Ma                    89089
......

Table_B:
Email              Name                 Sex        Phone
abc at gmail.com<mailto:abc at gmail.com>    John Chan        M                 0909
khn at hotmail.com<mailto:khn at hotmail.com>           Rosy  M               F                   7779
.....

Now, I have used -
merge (Table_A, Table_B, by="Email", all = FALSE))

- to find only the rows that match from these data frames.

Further, I am also interested (using "Email" as the common key) which rows from Table_A did not match with Table_B.
I am not sure how to do here.

 Thanks.


On 1 May 2018, at 9:35 pm, Chintanu <chintanu at gmail.com<mailto:chintanu at gmail.com>> wrote:


---------- Forwarded message ----------
From: peter dalgaard <pdalgd at gmail.com<mailto:pdalgd at gmail.com>>
Date: Tue, May 1, 2018 ar-help at r-project.org<mailto:r-help at r-project.org>t 9:05 PM
Subject: Re: [R] Merging dataframes
To: Rui Barradas <ruipbarradas at sapo.pt<mailto:ruipbarradas at sapo.pt>>
Cc: Chintanu <chintanu at gmail.com<mailto:chintanu at gmail.com>>, R help <r-help at r-project.org<mailto:r-help at r-project.org>>


I'd expect more like

setdiff(A$key, B$key)

and vice versa. Or, if you want the actual rows

A[!(A$key %in% B$key),]

or for the row numbers

which(!(A$key %in% B$key))


-pd




> On 1 May 2018, at 12:48 , Rui Barradas <ruipbarradas at sapo.pt<mailto:ruipbarradas at sapo.pt>> wrote:
>
> Hello,
>
> Is it something like this that you want?
>
> x <- data.frame(a = c(1:3, 5, 5:10), b = c(1:7, 7, 9:10))
> y <- data.frame(a = 1:10, b = 1:10)
>
> which(x != y, arr.ind = TRUE)
>
>
> Hope this helps,
>
> Rui Barradas
>
> On 5/1/2018 11:35 AM, Chintanu wrote:
>> Hi,
>> May I please ask how I do the following in R. Sorry - this may be trivial,
>> but I am struggling here for this.
>> For two dataframes (A and B), I wish to identify (based on a primary
>> key-column present in both A & B) -
>> 1. Which records (rows) of A did not match with B, and
>> 2. Which records of B did not match with A ?
>> I came across a setdt function while browsing, but when I tried it, it says
>> - Could not find function "setdt".
>> Overall, if there is any way of doing it (preferably in some simplified
>> way), please advise.
>> Many thanks in advance.
>> regards,
>> Tito
>>      [[alternative HTML version deleted]]
>> ______________________________________________
>> R-help at r-project.org<mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help<https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwMFaQ&c=8acizdFhFtEZX1sSgfHPrQ&r=BxjHz6j-Vy7scfJA1zHGhv3tYe2rG8R184kyAJS45dM&m=jMuWrLr4CoKyPhQHA8AN6zWVm7gVs8LF6UCNstNRRAQ&s=lfIRPP8CRcCepiCqApPDf7wZsVTrG9O2Lt8rByESWFI&e=>
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwMFaQ&c=8acizdFhFtEZX1sSgfHPrQ&r=BxjHz6j-Vy7scfJA1zHGhv3tYe2rG8R184kyAJS45dM&m=jMuWrLr4CoKyPhQHA8AN6zWVm7gVs8LF6UCNstNRRAQ&s=rW2b2LomxW9-0O0Tb34jnePsC_tX-3CpadlJWt9ikQc&e=>
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at r-project.org<mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help<https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwMFaQ&c=8acizdFhFtEZX1sSgfHPrQ&r=BxjHz6j-Vy7scfJA1zHGhv3tYe2rG8R184kyAJS45dM&m=jMuWrLr4CoKyPhQHA8AN6zWVm7gVs8LF6UCNstNRRAQ&s=lfIRPP8CRcCepiCqApPDf7wZsVTrG9O2Lt8rByESWFI&e=>
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwMFaQ&c=8acizdFhFtEZX1sSgfHPrQ&r=BxjHz6j-Vy7scfJA1zHGhv3tYe2rG8R184kyAJS45dM&m=jMuWrLr4CoKyPhQHA8AN6zWVm7gVs8LF6UCNstNRRAQ&s=rW2b2LomxW9-0O0Tb34jnePsC_tX-3CpadlJWt9ikQc&e=>
> and provide commented, minimal, self-contained, reproducible code.

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk<mailto:pd.mes at cbs.dk>  Priv: PDalgd at gmail.com<mailto:PDalgd at gmail.com>










IMPORTANT NOTICE: The information in this email (and any attachments) is confidential. If you are not the intended recipient, you must not use or disseminate the information. If you have received this email in error, please immediately notify me by "Reply" command and permanently delete the original and any copies or printouts thereof. Although this email and any attachments are believed to be free of any virus or other defect that might affect any computer system into which it is received and opened, it is the responsibility of the recipient to ensure that it is virus free and no responsibility is accepted by AIA Group Limited or its subsidiaries or affiliates either jointly or severally, for any loss or damage arising in any way from its use.

	[[alternative HTML version deleted]]



More information about the R-help mailing list