[R] setdiff bizarre (was: odd behavior out of setdiff)
G. Jay Kerns
gkerns at ysu.edu
Sun May 31 00:19:19 CEST 2009
(moved back to R-help)
On Sat, May 30, 2009 at 3:30 PM, Jason Rupert <jasonkrupert at yahoo.com> wrote:
> I really appreciate all your help help.
> I posted to Nabble an R file and input CSV files more accurately demonstrating what I am seeing and the output I desire to achieve when I difference two dataframes.
> It may be that "setdiff" as intended in the base R functionality and "prob" was never intended to provide the type of result I desire. If that is the case then I will need to ask the "Ninjas" for help to produce the out come I seek.
> That is, when I different the data within RSetDiffEntry.csv and RSetDuplicatesRemoved.csv, I desire to get the result shown in RDesired.csv.
> Note that, it would not be enough to just work to remove duplicate "CostPerSquareFoot" values, since that variable is tied to "EntryDate" and "HouseNumber".
> Any further help and insights are much appreciated.
> Thanks again,
>From your description, something like the following should work:
Let A = your RSetDiffEntry
Let B = your RSetDuplicatesRemoved...
C <- setdiff(A,B)
D <- rbind(A,C)
E <- D[duplicated(D),]
The E should = your RDesired.
Hope this helps,
P.S. I notice your row number 7 in "RSetDuplicatesRemoved" is
duplicated by the following row. That's a typo, yes? If so, then E
should have one more row than your "RDesired."
More information about the R-help