[R] difference of two data frames

Adaikalavan Ramasamy a.ramasamy at imperial.ac.uk
Sun Sep 14 20:33:42 CEST 2008


It would be useful to have indexed both dataframes with a unique 
identifier, such as in rownames etc.

Without that information, you could possibly try to use the same 
approach as duplicated() does by "pasting together a character 
representation of rows" using "|" (or any other separator).

    keys1 <- apply(DF1, 1, paste, collapse="|")
    keys1
    [1] "1|a" "2|b" "3|c" "4|d" "5|e" "6|f"
    duplicated(keys1)
    [1] FALSE FALSE FALSE FALSE FALSE FALSE

    keys2 <- apply(DF2, 1, paste, collapse="|")
    keys2
    [1] "1|a" "2|b" "3|c"
    duplicated(keys2)
    [1] FALSE FALSE FALSE

The duplicated part is neccessary to ensure the key generated is truly 
unique. You might want to experiment and see if you can create a unique 
key using just a few columns.


    keys1 %in% keys2
    [1]  TRUE  TRUE  TRUE FALSE FALSE FALSE

    w <- setdiff( keys1, keys2 )
    DF1[ w, ]
       V1 V2
    4  4  d
    5  5  e
    6  6  f

Regards, Adai



joseph wrote:
> Hi Jorge
> both commands work; 
> can you extend it to several coulmns?  the reason I am asking is that in my real data the uniqueness of the rows is made of all the columns; in other words V1 might have duplicates.
> Thanks
> 
> 
> 
> 
> ----- Original Message ----
> From: Jorge Ivan Velez <jorgeivanvelez at gmail.com>
> To: joseph <jdsandjd at yahoo.com>
> Cc: r-help at r-project.org
> Sent: Sunday, September 14, 2008 10:23:33 AM
> Subject: Re: [R] difference of two data frames
> 
> 
> 
> Hi Joseph,
> 
> Try this:
> 
> 
> DF1[!DF1$V1%in%DF2$V1,]
> 
> subset(DF1,!V1%in%DF2$V1)
> 
> 
> HTH,
> 
> Jorge
> 
> 
> On Sun, Sep 14, 2008 at 12:49 PM, joseph <jdsandjd at yahoo.com> wrote:
> 
> Hello
> I have 2 data frames DF1 and DF2 where DF2 is a subset of DF1:
> DF1= data.frame(V1=1:6, V2= letters[1:6])
> DF2= data.frame(V1=1:3, V2= letters[1:3])
> How do I create a new data frame of the difference between DF1 and DF2
> newDF=data.frame(V1=4:6, V2= letters[4:6])
> In my real data, the rows are not in order as in the example I provided.
> Thanks much
> Joseph
> 
> 
> 
>        [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 
>       
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list