[Rd] Time to revisit ifelse ?
Mikael Jagan
j@g@nmn2 @end|ng |rom gm@||@com
Fri Aug 1 19:29:55 CEST 2025
Andre,
There is a new thread (of length one, sadly), which you should read:
https://stat.ethz.ch/pipermail/r-devel/2025-July/084113.html
The function of mine that you have been testing was just a fast prototype,
and much work has been done in the mean time. Can you give the current
proposal (ifelse::ifelse1) a try and let us know if anything stands out?
Mikael
On 2025-08-01 1:13 pm, GILLIBERT, Andre wrote:
> Martin Maechler <maechler using stat.math.ethz.ch> wrote:
>> I don't mind putting together a minimal package with some prototypes, tests,
>> comparisons, etc. But perhaps we should aim for consensus on a few issues
>> beforehand. (Sorry if these have been discussed to death already elsewhere.
>> In that case, links to relevant threads would be helpful ...)
>>
>> 1. Should the type and class attribute of the return value be exactly the
>> type and class attribute of c(yes[0L], no[0L]), independent of 'test'?
>> Or something else?
>>
>> 2. What should be the attributes of the return value (other than 'class')?
>>
>> base::ifelse keeps attributes(test) if 'test' is atomic, which seems
>> like desirable behaviour, though dplyr and data.table seem to think
>> otherwise:
>
> In my experience, base::ifelse keeping attributes of 'test' is useful for names.
> It may also be useful for dimensions, but for other attributes, it may be a dangerous feature.
> Otherwise, attributes of c(yes, no) should be mostly preserved in my opinion.
>
>> 3. Should the new function be stricter and/or more verbose? E.g., should
>> it signal a condition if length(yes) or length(no) is not equal to 1
>> nor length(test)?
>
> To be consistent with base R, it should warn if length(yes), length(no) and length(test) are not divisors of the longest, otherwise silently repeat the three vectors to get the same sizes.
> This would work consistently with mathematical operators such as test+yes+no.
>
> In my personal experience, the truncation of 'yes' and 'no' to length(test) if the most dangerous feature of ifelse().
>
>> 4. Should the most common case, in which neither 'yes' nor 'no' has a
>> 'class' attribute, be handled in C? The remaining cases might rely on
>> method dispatch and thus require a separate "generic" implementation in
>> R. How much faster/more efficient would the C implementation have to
>> be to justify the cost (more maintenance for R-core, more obfuscation
>> for the average user)?
>
> If the function is not much slower than today ifelse(), it is not worth rewriting in C in my opinion.
>
> Thank you for an implementation!
> A few examples of misbehaviors (in my opinion):
>
>> ifelse2(c(a=TRUE), factor("a"), factor("b"))
> Error in as.character.factor(x) : malformed factor
>
>> ifelse2(TRUE, factor(c("a","b")), factor(c("b","a")))
> [1] a
> Levels: a b
>
> I would expect this one to output
> [1] a b
> Levels: a b
>
> I tried to develop a function that behaves like mathematical operators (e.g. test+yes+no) for length & dimensions coercion rules.
> Please, find the function and a few tests below:
>
> ifelse2 <- function (test, yes, no) {
> # forces evaluation of arguments in order
> test
> yes
> no
>
> if (is.atomic(test)) {
> if (!is.logical(test))
> storage.mode(test) <- "logical"
> }
> else test <- if (isS4(test)) methods::as(test, "logical") else as.logical(test)
>
> ntest <- length(test)
> nyes <- length(yes)
> nno <- length(no)
>
> nn <- c(ntest, nyes, nno)
> nans <- max(nn)
>
> ans <- rep(c(yes[0L], no[0L]), length.out=nans)
>
> # check dimension consistency for arrays
> has.dim <- FALSE
> if (length(dim(test)) | length(dim(yes)) | length(dim(no))) {
> lparams <- list(test, yes, no)
> ldims <- lapply(lparams, dim)
> ldims <- ldims[!sapply(ldims, is.null)]
> ldimnames <- lapply(lparams, dimnames)
> ldimnames <- ldimnames[!sapply(ldimnames, is.null)]
>
> rdim <- ldims[[1]]
> rdimnames <- ldimnames[[1]]
> for(d in ldims) {
> if (!identical(d, rdim)) {
> stop(gettext("non-conformable arrays"))
> }
> }
> has.dim <- TRUE
> }
>
> if (any(nans %% nn)) {
> warning(gettext("longer object length is not a multiple of shorter object length"))
> }
>
> if (ntest != nans) {test <- rep(test, length.out=nans)}
> if (nyes != nans) {yes <- rep(yes, length.out=nans)}
> if (nno != nans) {no <- rep(no, length.out=nans)}
>
> idx <- which( test)
> ans[idx] <- yes[idx]
>
> idx <- which(!test)
> ans[idx] <- no[idx]
>
> if (has.dim) {
> dim(ans) <- rdim
> dimnames(ans) <- rdimnames
> }
>
> if (!is.null(names(test))) {
> names(ans) <- names(test)
> }
>
> ans
> }
>
>
> ifelse2(c(alpha=TRUE,beta=TRUE,gamma=FALSE),factor(c("A","B","C","X")),factor(c("A","B","C","D")))
> ifelse2(c(TRUE,FALSE), as.Date("2025-04-01"), c("2020-07-05", "2022-07-05"))
> ifelse2(c(a=TRUE, b=FALSE,c=TRUE,d=TRUE), list(42), list(40,45))
> ifelse2(rbind(alpha=c(a=TRUE, b=FALSE),beta=c(c=TRUE,d=FALSE)), list(1:10), list(2:20,3:30))
> a=rbind(alpha=c(a=TRUE, b=FALSE),beta=c(TRUE,TRUE))
> b=rbind(ALPHA=c(A=TRUE, B=FALSE),BETA=c(C=TRUE,D=TRUE))
> c=rbind(ALPHA2=c(A2=TRUE, B2=FALSE),BETA2=c(C2=TRUE,D2=TRUE))
> ifelse2(a,b,c)
> dimnames(a) <- NULL
> ifelse2(a,b,c)
> dimnames(b) <- NULL
> ifelse2(a,b,c)
>
More information about the R-devel
mailing list