[R] Odd Behavior Out of setdiff(...) - addition of duplicate entries is not identified
Jason Rupert
jasonkrupert at yahoo.com
Fri May 29 20:48:49 CEST 2009
I think I am using the improved version of setdiff(...) that handles data.frames, so I think some odd behavior was expected but this one is escaping me.
It appears that the the addition of duplicate entries is not caught by the setdiff(...). Is this expected behavior?
If so, is there another method or approach that should be used to identify duplicate row entries between two different data frames?
Thanks in advance for any feedback.
Test1_DF<-data.frame(HouseSize=c(1:100))
Test2_DF<-rbind(Test1_DF, Test1_DF)
setdiff(Test1_DF, Test2_DF)
integer(0)
setdiff(Test2_DF, Test1_DF)
integer(0)
However,
Test3_DF<-data.frame(HouseSize=c(1:25))
setdiff(Test1_DF, Test3_DF)
[1] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
[17] 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57
[33] 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73
[49] 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89
[65] 90 91 92 93 94 95 96 97 98 99 100
setdiff(Test3_DF, Test1_DF)
integer(0)
More information about the R-help
mailing list