[R] compare two data frames of different dimensions and only keep unique rows
jim holtman
jholtman at gmail.com
Mon Feb 27 18:41:42 CET 2012
is this what you want:
> v <- rbind(reported, exportfile)
> v[!duplicated(v), ]
Product Price Nbr.Lots
1 Cocoa 2331.00 -61
2 Cocoa 2356.00 -61
3 Cocoa 2440.00 5
4 Cocoa 2450.00 1
6 Coffee C 204.55 40
7 Coffee C 205.45 40
5 GC 17792.00 -1
10 Sugar No 11 24.81 -1
8 ZS 1273.50 -1
9 ZS 1276.25 1
13 Cocoa 2440.00 6
>
On Mon, Feb 27, 2012 at 12:36 PM, Arnaud Gaboury
<arnaud.gaboury at a2ct2.com> wrote:
> Dear list,
>
> I am still struggling with something that should be easy: I compare two data frames with a lot of common rows and want to keep only rows that are NOT in both data frames, unique.
>
> Here are an example of these data frame.
>
> reported <-
> structure(list(Product = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 3L, 4L, 5L, 5L), .Label = c("Cocoa", "Coffee C", "GC", "Sugar No 11", "ZS"), class = "factor"), Price = c(2331, 2356, 2440, 2450, 204.55, 205.45, 17792, 24.81, 1273.5, 1276.25), Nbr.Lots = c(-61L, -61L, 5L, 1L, 40L, 40L, -1L, -1L, -1L, 1L)), .Names = c("Product", "Price", "Nbr.Lots"), row.names = c(1L, 2L, 3L, 4L, 6L, 7L, 5L, 10L, 8L, 9L), class = "data.frame")
>
> exportfile <-
> structure(list(Product = c("Cocoa", "Cocoa", "Cocoa", "Coffee C", "Coffee C", "GC", "Sugar No 11", "ZS", "ZS"), Price = c(2331, 2356, 2440, 204.55, 205.45, 17792, 24.81, 1273.5, 1276.25), Nbr.Lots = c(-61, -61, 6, 40, 40, -1, -1, -1, 1)), .Names = c("Product", "Price", "Nbr.Lots"), row.names = c(NA, 9L), class = "data.frame")
>
> I can rbind() them, thus resulting in one data frame with duplicated row, but I have no idea how to delete duplicated rows. I have tried plyaing with unique(), duplicated with no success
>
> v<-rbind(exportfile,reported)
> v <-
> structure(list(Product = c("Cocoa", "Cocoa", "Cocoa", "Coffee C",
> "Coffee C", "GC", "Sugar No 11", "ZS", "ZS", "Cocoa", "Cocoa",
> "Cocoa", "Cocoa", "Coffee C", "Coffee C", "GC", "Sugar No 11",
> "ZS", "ZS"), Price = c(2331, 2356, 2440, 204.55, 205.45, 17792,
> 24.81, 1273.5, 1276.25, 2331, 2356, 2440, 2450, 204.55, 205.45,
> 17792, 24.81, 1273.5, 1276.25), Nbr.Lots = c(-61, -61, 6, 40,
> 40, -1, -1, -1, 1, -61, -61, 5, 1, 40, 40, -1, -1, -1, 1)), .Names = c("Product",
> "Price", "Nbr.Lots"), row.names = c("1", "2", "3", "4", "5",
> "6", "7", "8", "9", "11", "21", "31", "41", "61", "71", "51",
> "10", "81", "91"), class = "data.frame")
>
>
> TY for your help
>
> Arnaud Gaboury
>
> A2CT2 Ltd.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Jim Holtman
Data Munger Guru
What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.
More information about the R-help
mailing list