[R] efficiently diff two data frames
Rui Barradas
ruipbarradas at sapo.pt
Tue Apr 16 20:12:37 CEST 2013
Hello,
Maybe Petr Savicky's answer in the link
https://stat.ethz.ch/pipermail/r-help/2012-February/304830.html
can lead you to what you want.
I've changed his function a bit in order to return a logical vector
into the rows where different rows return TRUE.
setdiffDF2 <- function(A, B){
f <- function(X, Y)
!duplicated(rbind(Y, X))[nrow(Y) + 1:nrow(X)]
ix1 <- f(A, B)
ix2 <- f(B, A)
ix1 & ix2
}
ix <- setdiffDF2(Xe, Xf)
Xe[ix,]
Xf[ix,]
Note that this gives no information on the columns.
Hope this helps,
Rui Barradas
Em 16-04-2013 18:42, Liviu Andronic escreveu:
> Dear all,
> What is the quickest and most efficient way to diff two data frames,
> so as to obtain a vector of indices (or logical) for rows/columns that
> differ in the two data frames? For example,
>> Xe <- head(mtcars)
>> Xf <- head(mtcars)
>> Xf[2:4,3:5] <- 55
>> all.equal(Xe, Xf)
> [1] "Component 3: Mean relative difference: 0.6863118"
> [2] "Component 4: Mean relative difference: 0.4728435"
> [3] "Component 5: Mean relative difference: 14.23546"
>
> I could use all.equal(), but it only returns human readable info that
> cannot be easily used programmatically. It also gives no info on the
> rows. Another way would be to:
> require(prob)
>> setdiff(Xe, Xf)
> mpg cyl disp hp drat wt qsec vs am gear carb
> Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
> Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
> Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
>
> But again this doesn't return subsetting indices, nor any info on hte
> columns. Any suggestions on how to approach this?
>
> Regards ,
> Liviu
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list