[R] merging/intersecting 2 data frames
jim holtman
jholtman at gmail.com
Tue Jun 29 21:31:25 CEST 2010
use 'merge'
> a.df
DATE GENDER PATIENT_ID AGE SYNDROME
1 4/16/2009 F 23686 45 RASH ON BODY
2 4/16/2009 F 13840 35 CANT URINATE
3 4/16/2009 M 12895 30 BLURRED VISION
4 4/16/2009 M 18375 33 UNABLE TO VOID
5 4/16/2009 M 2237 44 SOB WEAKNESS
6 4/16/2009 F 21484 41 TOOTH PAINTOOTH PAIN
7 4/16/2009 M 10783 37 RT ARM PAIN
8 4/16/2009 M 12610 65 L FOOT INJURY
9 4/16/2009 F 3495 29 URINARY DIFFICULTIES
10 4/16/2009 F 351 36 PT STS MVA
> b.df
DATE_OF_DEATH ID
1 4/19/2009 23686
2 4/19/2009 13840
3 4/19/2009 12895
4 4/19/2009 18375
5 4/19/2009 351
6 4/20/2009 3495
7 4/20/2009 4084
8 4/20/2009 19616
9 4/20/2009 17965
10 4/20/2009 11863
> merge(a.df, b.df, by.x="PATIENT_ID", by.y="ID")
PATIENT_ID DATE GENDER AGE SYNDROME DATE_OF_DEATH
1 351 4/16/2009 F 36 PT STS MVA 4/19/2009
2 3495 4/16/2009 F 29 URINARY DIFFICULTIES 4/20/2009
3 12895 4/16/2009 M 30 BLURRED VISION 4/19/2009
4 13840 4/16/2009 F 35 CANT URINATE 4/19/2009
5 18375 4/16/2009 M 33 UNABLE TO VOID 4/19/2009
6 23686 4/16/2009 F 45 RASH ON BODY 4/19/2009
>
On Tue, Jun 29, 2010 at 3:21 PM, Erin Hodgess <erinm.hodgess at gmail.com> wrote:
> Dear R People:
>
> I have two data frames, a.df and b.df as seen here:
>
>> a.df[1:10,]
> DATE GENDER PATIENT_ID AGE SYNDROME
> 1 4/16/2009 F 23686 45 RASH ON BODY
> 2 4/16/2009 F 13840 35 CANT URINATE
> 3 4/16/2009 M 12895 30 BLURRED VISION
> 4 4/16/2009 M 18375 33 UNABLE TO VOID
> 5 4/16/2009 M 2237 44 SOB WEAKNESS
> 6 4/16/2009 F 21484 41 TOOTH PAINTOOTH PAIN
> 7 4/16/2009 M 10783 37 RT ARM PAIN
> 8 4/16/2009 M 12610 65 L FOOT INJURY
> 9 4/16/2009 F 3495 29 URINARY DIFFICULTIES
> 10 4/16/2009 F 351 36 PT STS MVA
>> b.df[1:10,]
> DATE_OF_DEATH ID
> 1 4/19/2009 21676
> 2 4/19/2009 13717
> 3 4/19/2009 20498
> 4 4/19/2009 14281
> 5 4/19/2009 38848
> 6 4/20/2009 331
> 7 4/20/2009 4084
> 8 4/20/2009 19616
> 9 4/20/2009 17965
> 10 4/20/2009 11863
>>
>
> a.df will always be larger than b.df.
>
> I want to create a third data frame that is matched on PATIENT_ID from
> a.df and ID from b.df.
>
> If there is no match from a.df$PATIENT_ID to b.df$ID, then we omit the
> row from the new data.frame.
>
> If there is a match, we include the DATE_OF_DEATH column from b.df.
>
> I've tried all kinds of tricks, but nothing works exactly as I wish.
>
> Thanks in advance,
> Sincerely,
> Erin
>
>
> --
> Erin Hodgess
> Associate Professor
> Department of Computer and Mathematical Sciences
> University of Houston - Downtown
> mailto: erinm.hodgess at gmail.com
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem that you are trying to solve?
More information about the R-help
mailing list