[R] Odd Behavior Out of setdiff(...) - addition of duplicate entries is not identified

Jason Rupert jasonkrupert at yahoo.com
Fri May 29 20:48:49 CEST 2009


I think I am using the improved version of setdiff(...) that handles data.frames, so I think some odd behavior was expected but this one is escaping me.  

It appears that the the addition of duplicate entries is not caught by the setdiff(...).  Is this expected behavior? 

If so, is there another method or approach that should be used to identify duplicate row entries between two different data frames? 

Thanks in advance for any feedback. 

Test1_DF<-data.frame(HouseSize=c(1:100))
Test2_DF<-rbind(Test1_DF, Test1_DF)
setdiff(Test1_DF, Test2_DF)
integer(0)
setdiff(Test2_DF, Test1_DF)
integer(0)

However, 
Test3_DF<-data.frame(HouseSize=c(1:25))
setdiff(Test1_DF, Test3_DF)
 [1]  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41
[17]  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57
[33]  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72  73
[49]  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89
[65]  90  91  92  93  94  95  96  97  98  99 100

setdiff(Test3_DF, Test1_DF)
integer(0)




More information about the R-help mailing list