[R] compare two data frames of different dimensions and onlykeep unique rows
Arnaud Gaboury
arnaud.gaboury at a2ct2.com
Tue Feb 28 14:11:10 CET 2012
TY very much for your setdiffDF(). It does the job perfectly.
Arnaud Gaboury
A2CT2 Ltd.
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Petr Savicky
Sent: lundi 27 février 2012 20:41
To: r-help at r-project.org
Subject: Re: [R] compare two data frames of different dimensions and onlykeep unique rows
On Mon, Feb 27, 2012 at 07:10:57PM +0100, Arnaud Gaboury wrote:
> No, but I tried your way too.
>
> In fact, the only three unique rows are these ones:
>
> Product Price Nbr.Lots
> Cocoa 2440 5
> Cocoa 2450 1
> Cocoa 2440 6
>
> Here is a dirty working trick I found :
>
> > df<-merge(exportfile,reported,all.y=T)
> > df1<-merge(exportfile,reported)
> > dff1<-do.call(paste,df)
> > dff<-do.call(paste,df)
> > dff1<-do.call(paste,df1)
> > df[!dff %in% dff1,]
> Product Price Nbr.Lots
> 3 Cocoa 2440 5
> 4 Cocoa 2450 1
>
>
> My two problems are : I do think it is not so a clean code, then I won't know by advance which of my two df will have the greates dimension (I can add some lines to deal with it, but again, seems very heavy).
Hi.
Try the following.
setdiffDF <- function(A, B)
{
A[!duplicated(rbind(B, A))[nrow(B) + 1:nrow(A)], ]
}
df1 <- setdiffDF(reported, exportfile)
df2 <- setdiffDF(exportfile, reported)
rbind(df1, df2)
I obtained
Product Price Nbr.Lots
3 Cocoa 2440 5
4 Cocoa 2450 1
31 Cocoa 2440 6
Is this correct? I see the row
Cocoa 2440.00 6
only in exportfile and not in reported.
The trick with paste() is not a bad idea. A variant of it is used also in the base function duplicated.matrix(), since it contains
apply(x, MARGIN, function(x) paste(x, collapse = "\r"))
If speed is critical, then possibly the paste() trick written for the whole columns, for example
paste(df[[1]], df[[2]], df[[3]], sep="\r")
and then setdiff() can be better.
Hope this helps.
Petr Savicky.
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list