[Rd] setdiff bizarre

William Dunlap wdunlap at tibco.com
Tue Jun 2 19:18:23 CEST 2009


%in% is a thin wrapper on a call to match().  match() is
not a generic function (and is not documented to be one),
so it treats data.frames as lists, as their underlying
representation is a list of columns.  match is documented
to convert lists to character and to then run the character
version of match on that character data.  match does not
bail out if the types of the x and table arguments don't match
(that would be undesirable in the integer/numeric mismatch case).
Hence
   '1' %in% data.frame(1) # -> TRUE
is acting consistently with
   match(as.character(pi), c(1, pi, exp(1))) # -> 2
and
   1L %in% c(1.0, 2.0, 3.0) # -> TRUE

The related functions, duplicated() and unique(), do have
row-wise data.frame methods.  E.g.,
   > duplicated(data.frame(x=c(1,2,2,3,3),y=letters[c(1,1,2,2,2)]))
   [1] FALSE FALSE FALSE FALSE  TRUE
Perhaps match() ought to have one also.  S+'s match is generic
and has a data.frame method (which is row-oriented) so there we get:
   >  match(data.frame(x=c(1,3,5), y=letters[c(1,3,5)]),
data.frame(x=1:10,y=letters[1:10]))
   [1] 1 3 5
   > is.element(data.frame(x=1:10,y=letters[1:10]),
data.frame(x=c(1,3,5), y=letters[c(1,3,5)]))
    [1]  TRUE FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE

I think that %in% and is.element() ought to remain calls to match()
and that if you want them to work row-wise on data.frames then
match should get a data.frame method.


Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com  

> -----Original Message-----
> From: r-devel-bounces at r-project.org 
> [mailto:r-devel-bounces at r-project.org] On Behalf Of Wacek Kusnierczyk
> Sent: Tuesday, June 02, 2009 9:11 AM
> To: Stavros Macrakis
> Cc: r-devel at r-project.org; dwinsemius at comcast.net
> Subject: Re: [Rd] setdiff bizarre
> 
> Stavros Macrakis wrote:
> >
> >      '1:3' %in% data.frame(a=2:4,b=1:3)  # TRUE
> >   
> 
> utterly weird.  so what would x have to be so that
> 
>     x %in% data.frame('a')
>     # TRUE
> 
> hint: 
> 
>     '1' %in% data.frame(1)
>     # TRUE
> 
> vQ
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 



More information about the R-devel mailing list