[R] Odd Behavior Out of setdiff(...) - addition of duplicate entries is not identified

G. Jay Kerns gkerns at ysu.edu
Fri May 29 22:21:45 CEST 2009


Dear Jason,

On Fri, May 29, 2009 at 2:48 PM, Jason Rupert <jasonkrupert at yahoo.com> wrote:
>
> I think I am using the improved version of setdiff(...) that handles data.frames, so I think some odd behavior was expected but this one is escaping me.
>
> It appears that the the addition of duplicate entries is not caught by the setdiff(...).  Is this expected behavior?

[snip]

> Thanks in advance for any feedback.
>
> Test1_DF<-data.frame(HouseSize=c(1:100))
> Test2_DF<-rbind(Test1_DF, Test1_DF)
> setdiff(Test1_DF, Test2_DF)
> integer(0)
> setdiff(Test2_DF, Test1_DF)
> integer(0)
>
> However,
> Test3_DF<-data.frame(HouseSize=c(1:25))
> setdiff(Test1_DF, Test3_DF)
>  [1]  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41
> [17]  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57
> [33]  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72  73
> [49]  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89
> [65]  90  91  92  93  94  95  96  97  98  99 100
>
> setdiff(Test3_DF, Test1_DF)
> integer(0)


You didn't explicitly say which "improved version" of setdiff() that
you are using, so I can only presume that you are using the
setdiff.data.frame in the prob package.

The behaviour you are observing is expected and matches the
base:::setdiff behaviour in the case of vectors;  cf.

x1 <- c(1:100)
x2 <- c(x1,x1)

setdiff(x1, x2)  # integer(0)
setdiff(x2, x1)  # integer(0)

x3 <- c(1:25)
setdiff(x1, x3)  # 26:100
setdiff(x3, x1)  # integer(0)


>
> If so, is there another method or approach that should be used to identify duplicate row entries between two different data frames?
>

The R-help archives are chock full of every possible variant of
questions (and answers) about this, and you haven't said _exactly_
what you are looking for. In the absence of an already posted
solution, please specify exactly what you want and I'll wager an R
Ninja could dispatch it in moments.

Regards,
Jay









***************************************************
G. Jay Kerns, Ph.D.
Associate Professor
Department of Mathematics & Statistics
Youngstown State University
Youngstown, OH 44555-0002 USA
Office: 1035 Cushwa Hall
Phone: (330) 941-3310 Office (voice mail)
-3302 Department
-3170 FAX
E-mail: gkerns at ysu.edu
http://www.cc.ysu.edu/~gjkerns/




More information about the R-help mailing list