[R] setdiff bizarre (was: odd behavior out of setdiff)
G. Jay Kerns
gkerns at ysu.edu
Sun May 31 00:19:19 CEST 2009
Jason,
(moved back to R-help)
On Sat, May 30, 2009 at 3:30 PM, Jason Rupert <jasonkrupert at yahoo.com> wrote:
>
> Jay,
>
>
> I really appreciate all your help help.
>
> I posted to Nabble an R file and input CSV files more accurately demonstrating what I am seeing and the output I desire to achieve when I difference two dataframes.
> http://n2.nabble.com/Support-SetDiff-Discussion-Items...-td2999739.html
>
>
> It may be that "setdiff" as intended in the base R functionality and "prob" was never intended to provide the type of result I desire. If that is the case then I will need to ask the "Ninjas" for help to produce the out come I seek.
>
> That is, when I different the data within RSetDiffEntry.csv and RSetDuplicatesRemoved.csv, I desire to get the result shown in RDesired.csv.
>
> Note that, it would not be enough to just work to remove duplicate "CostPerSquareFoot" values, since that variable is tied to "EntryDate" and "HouseNumber".
>
> Any further help and insights are much appreciated.
>
> Thanks again,
> Jason
>
>From your description, something like the following should work:
Let A = your RSetDiffEntry
Let B = your RSetDuplicatesRemoved...
library(prob)
C <- setdiff(A,B)
D <- rbind(A,C)
E <- D[duplicated(D),]
The E should = your RDesired.
Hope this helps,
Jay
P.S. I notice your row number 7 in "RSetDuplicatesRemoved" is
duplicated by the following row. That's a typo, yes? If so, then E
should have one more row than your "RDesired."
More information about the R-help
mailing list