[R] setdiff bizarre (was: odd behavior out of setdiff)

Jason Rupert jasonkrupert at yahoo.com
Sun May 31 04:21:12 CEST 2009


Jay, 

Thanks again for all your help.  

I have ended up with something similar that appears to work and truly does provide the difference of two data frames including all the duplicate rows that may be removed due to filtering.  

Thanks again as this will be very helpful to me going forward as the data I receive often has duplicate rows that I filter out but want to double check that it is filtered out. 


Entry_DF<-read.csv("RSetDiffEntry.csv", header = TRUE)

EntryFiltered_DF<-subset(Entry_DF, !duplicated(Entry_DF))
EntryFiltered_DF<-subset(EntryFiltered_DF, !(EntryFiltered_DF$CostPerSquareFoot==0))
EntryFiltered_DF<-subset(EntryFiltered_DF, EntryFiltered_DF$CostPerSquareFoot>0)
EntryFiltered_DF<-subset(EntryFiltered_DF, EntryFiltered_DF$CostPerSquareFoot<300)

library("prob")
setDiff_DF<-setdiff(Entry_DF, EntryFiltered_DF)


DuplicateRows_DF<-subset(Entry_DF, duplicated(Entry_DF))


DesiredDFDiff_DF<-rbind(DuplicateRows_DF, setDiff_DF)

DesiredDFDiff_DF




--- On Sat, 5/30/09, G. Jay Kerns <gkerns at ysu.edu> wrote:

> From: G. Jay Kerns <gkerns at ysu.edu>
> Subject: Re: setdiff bizarre (was: odd behavior out of setdiff)
> To: "Jason Rupert" <jasonkrupert at yahoo.com>
> Cc: "David Winsemius" <dwinsemius at comcast.net>, "r-help at r-project.org" <r-help at r-project.org>
> Date: Saturday, May 30, 2009, 5:19 PM
> Jason,
> 
> (moved back to R-help)
> 
> On Sat, May 30, 2009 at 3:30 PM, Jason Rupert <jasonkrupert at yahoo.com>
> wrote:
> >
> > Jay,
> >
> >
> > I really appreciate all your help help.
> >
> > I posted to Nabble an R file and input CSV files more
> accurately demonstrating what I am seeing and the output I
> desire to achieve when I difference two dataframes.
> > http://n2.nabble.com/Support-SetDiff-Discussion-Items...-td2999739.html
> >
> >
> > It may be that "setdiff" as intended in the base R
> functionality and "prob" was never intended to provide the
> type of result I desire.  If that is the case then I will
> need to ask the "Ninjas" for help to produce the out come I
> seek.
> >
> > That is, when I different the data within
> RSetDiffEntry.csv and RSetDuplicatesRemoved.csv, I desire to
> get the result shown in  RDesired.csv.
> >
> > Note that, it would not be enough to just work to
> remove duplicate "CostPerSquareFoot" values, since that
> variable is tied to "EntryDate" and "HouseNumber".
> >
> > Any further help and insights are much appreciated.
> >
> > Thanks again,
> > Jason
> >
> 
> From your description, something like the following should
> work:
> 
> Let A = your RSetDiffEntry
> Let B = your RSetDuplicatesRemoved...
> 
> library(prob)
> C <- setdiff(A,B)
> D <- rbind(A,C)
> E <- D[duplicated(D),]
> 
> The E should = your RDesired.
> 
> Hope this helps,
> Jay
> 
> P.S.  I notice your row number 7 in
> "RSetDuplicatesRemoved" is
> duplicated by the following row. That's a typo, yes? 
> If so, then E
> should have one more row than your "RDesired."
> 







More information about the R-help mailing list