[R] setdiff bizarre (was: odd behavior out of setdiff)

G. Jay Kerns gkerns at ysu.edu
Sun May 31 00:19:19 CEST 2009


(moved back to R-help)

On Sat, May 30, 2009 at 3:30 PM, Jason Rupert <jasonkrupert at yahoo.com> wrote:
> Jay,
> I really appreciate all your help help.
> I posted to Nabble an R file and input CSV files more accurately demonstrating what I am seeing and the output I desire to achieve when I difference two dataframes.
> http://n2.nabble.com/Support-SetDiff-Discussion-Items...-td2999739.html
> It may be that "setdiff" as intended in the base R functionality and "prob" was never intended to provide the type of result I desire.  If that is the case then I will need to ask the "Ninjas" for help to produce the out come I seek.
> That is, when I different the data within RSetDiffEntry.csv and RSetDuplicatesRemoved.csv, I desire to get the result shown in  RDesired.csv.
> Note that, it would not be enough to just work to remove duplicate "CostPerSquareFoot" values, since that variable is tied to "EntryDate" and "HouseNumber".
> Any further help and insights are much appreciated.
> Thanks again,
> Jason

>From your description, something like the following should work:

Let A = your RSetDiffEntry
Let B = your RSetDuplicatesRemoved...

C <- setdiff(A,B)
D <- rbind(A,C)
E <- D[duplicated(D),]

The E should = your RDesired.

Hope this helps,

P.S.  I notice your row number 7 in "RSetDuplicatesRemoved" is
duplicated by the following row. That's a typo, yes?  If so, then E
should have one more row than your "RDesired."

More information about the R-help mailing list