[Rd] setdiff bizarre
Wacek Kusnierczyk
Waclaw.Marcin.Kusnierczyk at idi.ntnu.no
Tue Jun 2 20:03:36 CEST 2009
William Dunlap wrote:
> %in% is a thin wrapper on a call to match(). match() is
> not a generic function (and is not documented to be one),
> so it treats data.frames as lists, as their underlying
> representation is a list of columns. match is documented
> to convert lists to character and to then run the character
> version of match on that character data. match does not
> bail out if the types of the x and table arguments don't match
> (that would be undesirable in the integer/numeric mismatch case).
>
yes, i understand that this is documented behaviour, and that it's not a
bug. nevertheless, the example is odd, and hints that there's a design
flaw. i also do not understand why the following should be useful and
desirable:
as.character(list('a'))
# "a"
as.character(data.frame('a'))
# "1"
and hence
'a' %in% list('a')
# TRUE
while
'a' %in% data.frame('a')
# FALSE
'1' %in% data.frame('a')
# TRUE
there is a mechanistic explanation for how this works, but is there one
for why this works this way?
> Hence
> '1' %in% data.frame(1) # -> TRUE
> is acting consistently with
> match(as.character(pi), c(1, pi, exp(1))) # -> 2
> and
> 1L %in% c(1.0, 2.0, 3.0) # -> TRUE
>
> The related functions, duplicated() and unique(), do have
> row-wise data.frame methods. E.g.,
> > duplicated(data.frame(x=c(1,2,2,3,3),y=letters[c(1,1,2,2,2)]))
> [1] FALSE FALSE FALSE FALSE TRUE
> Perhaps match() ought to have one also. S+'s match is generic
> and has a data.frame method (which is row-oriented) so there we get:
> > match(data.frame(x=c(1,3,5), y=letters[c(1,3,5)]),
> data.frame(x=1:10,y=letters[1:10]))
> [1] 1 3 5
> > is.element(data.frame(x=1:10,y=letters[1:10]),
> data.frame(x=c(1,3,5), y=letters[c(1,3,5)]))
> [1] TRUE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
>
> I think that %in% and is.element() ought to remain calls to match()
> and that if you want them to work row-wise on data.frames then
> match should get a data.frame method.
>
sounds good to me. how is
'a' %in% data.frame('a')
in S+?
thanks for the response.
regards,
vQ
More information about the R-devel
mailing list