[R] compare two data frames of different dimensions and only	keep unique rows
    Petr Savicky 
    savicky at cs.cas.cz
       
    Mon Feb 27 20:40:49 CET 2012
    
    
  
On Mon, Feb 27, 2012 at 07:10:57PM +0100, Arnaud Gaboury wrote:
> No, but I tried your way too.
> 
> In fact, the only three unique rows are these ones:
> 
>  Product Price Nbr.Lots
>    Cocoa  2440        5
>    Cocoa  2450        1
>    Cocoa  2440        6
> 
> Here is a dirty working trick I found :
> 
> > df<-merge(exportfile,reported,all.y=T)
> > df1<-merge(exportfile,reported)
> > dff1<-do.call(paste,df)
> > dff<-do.call(paste,df)
> > dff1<-do.call(paste,df1)
> > df[!dff %in% dff1,]
>   Product Price Nbr.Lots
> 3   Cocoa  2440        5
> 4   Cocoa  2450        1
>  
> 
> My two problems are : I do think it is not so a clean code, then I won't know by advance which of my two df will have the greates dimension (I can add some lines to deal with it, but again, seems very heavy).
Hi.
Try the following.
  setdiffDF <- function(A, B)
  {
      A[!duplicated(rbind(B, A))[nrow(B) + 1:nrow(A)], ]
  }
  df1 <- setdiffDF(reported, exportfile)
  df2 <- setdiffDF(exportfile, reported)
  rbind(df1, df2)
I obtained
     Product Price Nbr.Lots
  3    Cocoa  2440        5
  4    Cocoa  2450        1
  31   Cocoa  2440        6
Is this correct? I see the row
  Cocoa  2440.00        6
only in exportfile and not in reported.
The trick with paste() is not a bad idea. A variant of
it is used also in the base function duplicated.matrix(),
since it contains
  apply(x, MARGIN, function(x) paste(x, collapse = "\r"))
If speed is critical, then possibly the paste() trick
written for the whole columns, for example
  paste(df[[1]], df[[2]], df[[3]], sep="\r")
and then setdiff() can be better.
Hope this helps.
Petr Savicky.
    
    
More information about the R-help
mailing list