[R] merging/intersecting 2 data frames
John Kane
jrkrideau at yahoo.ca
Wed Jun 30 21:52:06 CEST 2010
Have you changed the values in b.df? My reading of the original b.df in Erin's post was that there were no common values in PATIENT_ID and ID.
--- On Tue, 6/29/10, jim holtman <jholtman at gmail.com> wrote:
> From: jim holtman <jholtman at gmail.com>
> Subject: Re: [R] merging/intersecting 2 data frames
> To: "Erin Hodgess" <erinm.hodgess at gmail.com>
> Cc: "R help" <r-help at stat.math.ethz.ch>
> Received: Tuesday, June 29, 2010, 3:31 PM
> use 'merge'
>
> > a.df
> DATE GENDER PATIENT_ID
> AGE
> SYNDROME
> 1 4/16/2009 F
> 23686 45 RASH ON
> BODY
> 2 4/16/2009 F
> 13840 35 CANT
> URINATE
> 3 4/16/2009 M
> 12895 30 BLURRED
> VISION
> 4 4/16/2009 M
> 18375 33 UNABLE TO
> VOID
> 5 4/16/2009 M
> 2237 44
> SOB WEAKNESS
> 6 4/16/2009 F
> 21484 41 TOOTH PAINTOOTH PAIN
> 7 4/16/2009 M
> 10783 37 RT ARM
> PAIN
> 8 4/16/2009 M
> 12610 65 L FOOT INJURY
> 9 4/16/2009 F
> 3495 29 URINARY DIFFICULTIES
> 10 4/16/2009 F
> 351 36
> PT STS MVA
> > b.df
> DATE_OF_DEATH ID
> 1 4/19/2009 23686
> 2 4/19/2009 13840
> 3 4/19/2009 12895
> 4 4/19/2009 18375
> 5 4/19/2009 351
> 6 4/20/2009 3495
> 7 4/20/2009 4084
> 8 4/20/2009 19616
> 9 4/20/2009 17965
> 10 4/20/2009 11863
> > merge(a.df, b.df, by.x="PATIENT_ID", by.y="ID")
> PATIENT_ID DATE GENDER AGE
> SYNDROME
> DATE_OF_DEATH
> 1 351 4/16/2009
> F 36
> PT STS MVA
> 4/19/2009
> 2 3495 4/16/2009
> F 29 URINARY DIFFICULTIES
> 4/20/2009
> 3 12895 4/16/2009
> M 30 BLURRED
> VISION 4/19/2009
> 4 13840 4/16/2009
> F 35 CANT
> URINATE 4/19/2009
> 5 18375 4/16/2009
> M 33 UNABLE TO
> VOID 4/19/2009
> 6 23686 4/16/2009
> F 45 RASH ON
> BODY 4/19/2009
> >
>
>
> On Tue, Jun 29, 2010 at 3:21 PM, Erin Hodgess <erinm.hodgess at gmail.com>
> wrote:
> > Dear R People:
> >
> > I have two data frames, a.df and b.df as seen here:
> >
> >> a.df[1:10,]
> > DATE GENDER PATIENT_ID AGE
> SYNDROME
> > 1 4/16/2009 F 23686 45
> RASH ON BODY
> > 2 4/16/2009 F 13840 35
> CANT URINATE
> > 3 4/16/2009 M 12895 30
> BLURRED VISION
> > 4 4/16/2009 M 18375 33
> UNABLE TO VOID
> > 5 4/16/2009 M 2237 44
> SOB WEAKNESS
> > 6 4/16/2009 F 21484 41 TOOTH
> PAINTOOTH PAIN
> > 7 4/16/2009 M 10783 37
> RT ARM PAIN
> > 8 4/16/2009 M 12610 65
> L FOOT INJURY
> > 9 4/16/2009 F 3495 29 URINARY
> DIFFICULTIES
> > 10 4/16/2009 F 351 36
> PT STS MVA
> >> b.df[1:10,]
> > DATE_OF_DEATH ID
> > 1 4/19/2009 21676
> > 2 4/19/2009 13717
> > 3 4/19/2009 20498
> > 4 4/19/2009 14281
> > 5 4/19/2009 38848
> > 6 4/20/2009 331
> > 7 4/20/2009 4084
> > 8 4/20/2009 19616
> > 9 4/20/2009 17965
> > 10 4/20/2009 11863
> >>
> >
> > a.df will always be larger than b.df.
> >
> > I want to create a third data frame that is matched on
> PATIENT_ID from
> > a.df and ID from b.df.
> >
> > If there is no match from a.df$PATIENT_ID to b.df$ID,
> then we omit the
> > row from the new data.frame.
> >
> > If there is a match, we include the DATE_OF_DEATH
> column from b.df.
> >
> > I've tried all kinds of tricks, but nothing works
> exactly as I wish.
> >
> > Thanks in advance,
> > Sincerely,
> > Erin
> >
> >
> > --
> > Erin Hodgess
> > Associate Professor
> > Department of Computer and Mathematical Sciences
> > University of Houston - Downtown
> > mailto: erinm.hodgess at gmail.com
> >
> > ______________________________________________
> > R-help at r-project.org
> mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained,
> reproducible code.
> >
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?
>
> ______________________________________________
> R-help at r-project.org
> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.
>
More information about the R-help
mailing list