[Rd] suggestion for "sets" tools upgrade
Duncan Murdoch
murdoch.duncan at gmail.com
Fri Feb 7 13:37:33 CET 2014
On 14-02-06 8:31 PM, Carl Witthoft wrote:
> First, let me apologize in advance if this is the wrong place to submit
> a suggestion for a change to functions in the base-R package. It never
> really occurred to me that I'd have an idea worthy of such a change.
>
> My idea is to provide an upgrade to all the "sets" tools (intersect,
> union, setdiff, setequal) that allows the user to apply them in a
> strictly algebraic style.
>
> The current tools, as well documented, remove duplicate values in the
> input vectors. This can be helpful in stats work, but is inconsistent
> with the mathematical concept of sets and set measure.
I understand what you are asking for, but I think this justification for
it is just wrong. Sets don't have duplicated elements: an element is
in a set, or it is not. It can't be in the set more than once.
What I propose
> is that all these functions be given an additional argument with a
> default value: "multiple=FALSE" . When called this way, the functions
> remain as at present. When called with "multiple=TRUE," they treat the
> input vectors as true 'sets' of elements.
>
> I've already written and tested upgrades to all four functions, so if
> upgrading the base-R package is not appropriate, I'll post as a package
> to CRAN. It just seems more sensible to add to the base.
>
> Thanks in advance for any advice or comments.
> (Please be sure to email, as I can't recall if I'm currently registered
> for r-devel)
>
> Here's an example of the new code:
>
> intersect<-function (x, y,multiple=FALSE)
> {
> y <- as.vector(y)
> trueint <- y[match(as.vector(x), y, 0L)]
> if(!multiple) trueint <- unique(trueint)
> return(trueint)
> }
This is not symmetric. I'd like intersect(x,y,TRUE) to be the same as
intersect(y,x,TRUE), up to re-ordering. That's not true of your function:
> x <- c(1,1,2,3)
> y <- c(1,1,1,4)
> intersect(x,y,multiple=TRUE)
[1] 1 1
> intersect(y,x,multiple=TRUE)
[1] 1 1 1
I'd suggest that you clearly define what you mean by your functions, and
put them in a package, along with examples where they give more useful
results than the standard definitions. I think the current base package
functions match the mathematical definitions better.
Duncan Murdoch
More information about the R-devel
mailing list