[R] merging/intersecting 2 data frames

John Kane jrkrideau at yahoo.ca
Wed Jun 30 21:52:06 CEST 2010


Have you changed the values in b.df?  My reading of the original b.df in Erin's post was that there were no common values in PATIENT_ID and ID.


--- On Tue, 6/29/10, jim holtman <jholtman at gmail.com> wrote:

> From: jim holtman <jholtman at gmail.com>
> Subject: Re: [R] merging/intersecting 2 data frames
> To: "Erin Hodgess" <erinm.hodgess at gmail.com>
> Cc: "R help" <r-help at stat.math.ethz.ch>
> Received: Tuesday, June 29, 2010, 3:31 PM
> use 'merge'
> 
> > a.df
>         DATE GENDER PATIENT_ID
> AGE         
>    SYNDROME
> 1  4/16/2009      F     
> 23686  45         RASH ON
> BODY
> 2  4/16/2009      F     
> 13840  35         CANT
> URINATE
> 3  4/16/2009      M     
> 12895  30       BLURRED
> VISION
> 4  4/16/2009      M     
> 18375  33       UNABLE TO
> VOID
> 5  4/16/2009      M   
>    2237  44     
>    SOB WEAKNESS
> 6  4/16/2009      F     
> 21484  41 TOOTH PAINTOOTH PAIN
> 7  4/16/2009      M     
> 10783  37          RT ARM
> PAIN
> 8  4/16/2009      M     
> 12610  65        L FOOT INJURY
> 9  4/16/2009      F   
>    3495  29 URINARY DIFFICULTIES
> 10 4/16/2009      F     
>   351  36       
>    PT STS MVA
> > b.df
>    DATE_OF_DEATH    ID
> 1      4/19/2009 23686
> 2      4/19/2009 13840
> 3      4/19/2009 12895
> 4      4/19/2009 18375
> 5      4/19/2009   351
> 6      4/20/2009  3495
> 7      4/20/2009  4084
> 8      4/20/2009 19616
> 9      4/20/2009 17965
> 10     4/20/2009 11863
> > merge(a.df, b.df, by.x="PATIENT_ID", by.y="ID")
>   PATIENT_ID      DATE GENDER AGE 
>            SYNDROME
> DATE_OF_DEATH
> 1        351 4/16/2009   
>   F  36       
>    PT STS MVA 
>    4/19/2009
> 2       3495 4/16/2009 
>     F  29 URINARY DIFFICULTIES 
>    4/20/2009
> 3      12895 4/16/2009     
> M  30       BLURRED
> VISION     4/19/2009
> 4      13840 4/16/2009     
> F  35         CANT
> URINATE     4/19/2009
> 5      18375 4/16/2009     
> M  33       UNABLE TO
> VOID     4/19/2009
> 6      23686 4/16/2009     
> F  45         RASH ON
> BODY     4/19/2009
> >
> 
> 
> On Tue, Jun 29, 2010 at 3:21 PM, Erin Hodgess <erinm.hodgess at gmail.com>
> wrote:
> > Dear R People:
> >
> > I have two data frames, a.df and b.df as seen here:
> >
> >> a.df[1:10,]
> >        DATE GENDER PATIENT_ID AGE          
>   SYNDROME
> > 1  4/16/2009      F      23686  45        
> RASH ON BODY
> > 2  4/16/2009      F      13840  35        
> CANT URINATE
> > 3  4/16/2009      M      12895  30      
> BLURRED VISION
> > 4  4/16/2009      M      18375  33      
> UNABLE TO VOID
> > 5  4/16/2009      M       2237  44        
> SOB WEAKNESS
> > 6  4/16/2009      F      21484  41 TOOTH
> PAINTOOTH PAIN
> > 7  4/16/2009      M      10783  37        
>  RT ARM PAIN
> > 8  4/16/2009      M      12610  65      
>  L FOOT INJURY
> > 9  4/16/2009      F       3495  29 URINARY
> DIFFICULTIES
> > 10 4/16/2009      F        351  36        
>   PT STS MVA
> >> b.df[1:10,]
> >   DATE_OF_DEATH    ID
> > 1      4/19/2009 21676
> > 2      4/19/2009 13717
> > 3      4/19/2009 20498
> > 4      4/19/2009 14281
> > 5      4/19/2009 38848
> > 6      4/20/2009   331
> > 7      4/20/2009  4084
> > 8      4/20/2009 19616
> > 9      4/20/2009 17965
> > 10     4/20/2009 11863
> >>
> >
> > a.df will always be larger than b.df.
> >
> > I want to create a third data frame that is matched on
> PATIENT_ID from
> > a.df and ID from b.df.
> >
> > If there is no match from a.df$PATIENT_ID to b.df$ID,
> then we omit the
> > row from the new data.frame.
> >
> > If there is a match, we include the DATE_OF_DEATH
> column from b.df.
> >
> > I've tried all kinds of tricks, but nothing works
> exactly as I wish.
> >
> > Thanks in advance,
> > Sincerely,
> > Erin
> >
> >
> > --
> > Erin Hodgess
> > Associate Professor
> > Department of Computer and Mathematical Sciences
> > University of Houston - Downtown
> > mailto: erinm.hodgess at gmail.com
> >
> > ______________________________________________
> > R-help at r-project.org
> mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained,
> reproducible code.
> >
> 
> 
> 
> -- 
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
> 
> What is the problem that you are trying to solve?
> 
> ______________________________________________
> R-help at r-project.org
> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.
> 





More information about the R-help mailing list